When looking at the time course data, we noticed an interesting pattern: it seemed as though it would be possible to predict an individual’s group from the difference between their load effect during encoding and their load effect during delay. It seemed as though low capacity subjects had small load effects at both encoding and delay, medium capacity subjects had large load effects at both encoding and delay and high capacity subjects had large load effects at encoding but small ones at delay. As such, we wanted to create a model that used this information to predict span.

To do so, we extracted data from TR 6 for the encoding period and TR 8 for the delay period. This represented 1 TR’s worth of data in the middle of where we’d expect activity from for each period (from our model of the task convolved with a hemodynamic delay function) and where we see maximal differences between group. In addition to the raw load effects, we created two composite variables that we expected to capture the differences between groups. The first was simply the difference between load effects at encoding and delay, and the second was the sum of that difference and the load effect effect at encoding. These measures allow us to create a series of regression models to predict span, BPRS total score and accuracy at high load.

In addition to the univariate load effects, we have also seen a relationship with various measures reflecting multivariate representation across and within trial, including the probability of a MVPA classifer predicting a face at any given point in a trial and the similarity of multivariate representations across and within trials. Both of these measures were taken at each individual trial and averaged, in addition to taken from the average over many trials (which served as a template for the canonical trial type). We added these multivariate measures to see if they improved model performance when added to regression models that only contained variables based on univariate load effects.

Finally, we added in structural measures, resting state measures and EEG measures to try to explain variance at multiple different levels/modalities.

Overall, we were best able to explain variance in high load accuracy, with our best model able to explain 26% of the variance. In contrast, we were only able to explain 21% of the variance in span, and 12% of the variance in BPRS scores.

The individual components related to each of these measures will be discussed further below, but one interesting thing to note is that there is largely not much overlap. There is some overlap in the EEG measures between BPRS and span, but no overlap between span and accuracy or accuracy and BPRS. Another interesting thing to note is that when comparing PCs related to accuracy and span, those related to accuracy seem to be very much driven by the encoding and measures related to visual processing, while PCs related to span include more connectivity/multivariate measures and are also driven by delay period, or processing in DFR regions.

library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.1
## ✓ tidyr   1.1.1     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(rmatio)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(patchwork)
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:patchwork':
## 
##     area
## The following object is masked from 'package:dplyr':
## 
##     select
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
library(glmnet)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## Loaded glmnet 4.0-2
library(tidymodels)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────── tidymodels 0.1.1 ──
## ✓ broom     0.7.0      ✓ recipes   0.1.13
## ✓ dials     0.0.8      ✓ rsample   0.0.7 
## ✓ infer     0.5.3      ✓ tune      0.1.1 
## ✓ modeldata 0.0.2      ✓ workflows 0.1.3 
## ✓ parsnip   0.1.3      ✓ yardstick 0.0.7
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
## x psych::%+%()             masks ggplot2::%+%()
## x scales::alpha()          masks psych::alpha(), ggplot2::alpha()
## x scales::discard()        masks purrr::discard()
## x Matrix::expand()         masks tidyr::expand()
## x dplyr::filter()          masks stats::filter()
## x recipes::fixed()         masks stringr::fixed()
## x dplyr::lag()             masks stats::lag()
## x caret::lift()            masks purrr::lift()
## x Matrix::pack()           masks tidyr::pack()
## x yardstick::precision()   masks caret::precision()
## x yardstick::recall()      masks caret::recall()
## x MASS::select()           masks dplyr::select()
## x yardstick::sensitivity() masks caret::sensitivity()
## x yardstick::spec()        masks readr::spec()
## x yardstick::specificity() masks caret::specificity()
## x recipes::step()          masks stats::step()
## x Matrix::unpack()         masks tidyr::unpack()
load('data/behav.RData')
load('data/structural_measures.RData')
FA_Data <- read.csv("data/RDoC_DTI_FA_JHU_ICBM.csv")
load('data/EEG_for_PCs.RData')
load('data/connectivity_data.RData')
load('data/load_effects_DFR.RData')
load('data/split_groups_info.RData')
load('data/ITC_fusiform.RData')
similarity_fusiform <- similarity_temp
load('data/ITC_DFR_delay.RData')
similarity_DFR <- similarity_temp 
load("data/MVPA_fusiform.RData")
individual_trial_probs_fusiform <- individual_trial_averages_probs
averages_from_template_fusiform <- averages_from_template
load("data/MVPA_DFR_delay_mask.RData")
individual_trial_probs_DFR <- individual_trial_averages_probs
averages_from_template_DFR <- averages_from_template
load("data/MVPA_HPC.RData")
individual_trial_probs_HPC <- individual_trial_averages_probs
averages_from_template_HPC <- averages_from_template

DFR_ROIs <- c("L aMFG","L dlPFC","L dMFG","L IPS","L preSMA","R dlPFC","R IPS","R medParietal","all delay ROIs")

source("helper_fxns/select_period_average.R")
source("helper_fxns/calc_r_squared.R")

Prep data

First, we’re going to load in the time courses of the activity from the DFR regions. We’re doing this a little differently from last time, because we don’t want the interpolated data, we just want the one with 14 TRs.

We’re going to remove the subjects that aren’t included in the split group analysis, and put the data in a slightly easier to use format.

temp <- read.mat('data/RSA_DFR_trials.mat')
factors <- matrix(nrow=170)

for (idx in seq.int(1,170)){
  if (constructs_fMRI$PTID[idx] %in% WM_groups[["high"]]$PTID){
    factors[idx] <- "high"
  }else if (constructs_fMRI$PTID[idx] %in% WM_groups[["med"]]$PTID){
    factors[idx] <- "med"
  }else if (constructs_fMRI$PTID[idx] %in% WM_groups[["low"]]$PTID){
    factors[idx] <- "low"
  }else{
    factors[idx] <- "not_incl"
  }
}

factors <- factor(factors, levels=c("low","med","high","not_incl"))
trial_activity_high <- temp[["trial_avg_activity_high"]]
trial_activity_low <- temp[["trial_avg_activity_low"]]

all_data <- list(subjs=constructs_fMRI$PTID,
                 WMC = factors, 
                 L_aMFG = list(
                   high = data.frame(trial_activity_high[,,1,1]),
                   low = data.frame(trial_activity_low[,,1,1])
                 ),
                 L_dlPFC = list(
                   high = data.frame(trial_activity_high[,,1,2]),
                   low = data.frame(trial_activity_low[,,1,2])
                 ),
                 L_dMFG = list(
                   high = data.frame(trial_activity_high[,,1,3]),
                   low = data.frame(trial_activity_low[,,1,3])
                 ), 
                 L_IPS = list(
                   high = data.frame(trial_activity_high[,,1,4]),
                   low = data.frame(trial_activity_low[,,1,4])
                 ), 
                 L_preSMA = list(
                   high = data.frame(trial_activity_high[,,1,5]),
                   low = data.frame(trial_activity_low[,,1,5])
                 ), 
                 R_dlPFC = list(
                   high = data.frame(trial_activity_high[,,1,6]),
                   low = data.frame(trial_activity_low[,,1,6])
                 ),
                 R_IPS = list(
                   high = data.frame(trial_activity_high[,,1,7]),
                   low = data.frame(trial_activity_low[,,1,7])
                 ),
                 R_medParietal = list(
                   high = data.frame(trial_activity_high[,,1,8]),
                   low = data.frame(trial_activity_low[,,1,8])
                 ),
                 all_delay = list(
                   high = data.frame(trial_activity_high[,,1,9]),
                   low = data.frame(trial_activity_low[,,1,9])
                 )
)

Now, we’re going to calculate load effects for each ROI at every time point.

for (ROI in seq.int(3,11)){
  all_data[[ROI]][["load_effect"]] <- all_data[[ROI]][["high"]]-all_data[[ROI]][["low"]]
  temp <- data.frame(group = all_data[["WMC"]], all_data[[ROI]][["load_effect"]])
  all_data[[ROI]][["SVM"]] <- temp
}

Now, let’s pull out the encoding and delay values - we’re also going to calculate the difference between encoding and delay, and encoding + (encoding-delay), because we think that this is going to be useful. We’re going to include all these measures, plus span, in a convenient dataframe.

for (ROI in seq.int(3,11)){
  temp <- data.frame(matrix(nrow=170, ncol=6))
  colnames(temp) <- c("group","encoding","delay","encoding_delay","encoding_delay_comb" ,"span")
  temp$group <- all_data[["WMC"]]
  temp$span <- constructs_fMRI$omnibus_span_no_DFR_MRI
  
  temp$encoding <- all_data[[ROI]][["load_effect"]][,6]
  #temp$delay <- rowMeans(all_data[[ROI]][["load_effect"]][,8:9])
  
  temp$delay <- all_data[[ROI]][["load_effect"]][,8]
  temp$encoding_delay <- temp$encoding - temp$delay
  temp$encoding_delay_comb <- temp$encoding + temp$encoding_delay 
  temp$high_acc <- p200_data$XDFR_MRI_ACC_L3[p200_data$PTID %in% constructs_fMRI$PTID]
  
  temp$BPRS <- p200_clinical_zscores$BPRS_TOT[p200_clinical_zscores$PTID %in% constructs_fMRI$PTID]
  
  all_data[[ROI]][["SVM_2"]] <- temp
  
}

Relationship between Span and Encoding - Delay

All regions except for L aMFG have significant relationships

ROI_plot_list <- list()

for (ROI in seq.int(3,11)){
  correlation_test <- cor.test(all_data[[ROI]][["SVM_2"]]$span,all_data[[ROI]][["SVM_2"]]$encoding_delay)
  
  ROI_plot_list[[DFR_ROIs[ROI-2]]] <- ggplot(data=all_data[[ROI]][["SVM_2"]])+
    geom_point(aes(x=span,y=encoding_delay_comb,color=group))+
    stat_smooth(aes(x=span,y=encoding_delay_comb),method="lm",color="black")+
    ggtitle(paste(DFR_ROIs[ROI-2], ", r = ",round(correlation_test$estimate,digits=3)))+
    xlab("WM Span")+
    ylab("LE Difference")+
    theme_classic()
  
}

(ROI_plot_list[[1]] + ROI_plot_list[[2]]) /
  (ROI_plot_list[[3]] + ROI_plot_list[[4]]) +  
  plot_annotation(title = "Correlation between (encoding LE + encoding/delay LE difference) and span ")+
  plot_layout(guides="collect")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

(ROI_plot_list[[5]] + ROI_plot_list[[6]]) /
  (ROI_plot_list[[7]] + ROI_plot_list[[8]]) +
  plot_layout(guides="collect")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

ROI_plot_list[[9]]   
## `geom_smooth()` using formula 'y ~ x'

Explore data

data_for_reg <- data.frame(span = all_data[["all_delay"]][["SVM_2"]]$span,
                           high_acc = all_data[["all_delay"]][["SVM_2"]]$high_acc, 
                           BPRS = all_data[["all_delay"]][["SVM_2"]]$BPRS,
                           L_aMFG_enc = all_data[[3]][["SVM_2"]]$encoding, 
                           L_aMFG_delay = all_data[[3]][["SVM_2"]]$delay, 
                           L_dlPFC_enc = all_data[[4]][["SVM_2"]]$encoding, 
                           L_dlPFC_delay = all_data[[4]][["SVM_2"]]$delay,
                           L_dMFG_enc = all_data[[5]][["SVM_2"]]$encoding, 
                           L_dMFG_delay = all_data[[5]][["SVM_2"]]$delay,
                           L_IPS_enc = all_data[[6]][["SVM_2"]]$encoding, 
                           L_IPS_delay = all_data[[6]][["SVM_2"]]$delay,
                           L_preSMA_enc = all_data[[7]][["SVM_2"]]$encoding, 
                           L_preSMA_delay = all_data[[7]][["SVM_2"]]$delay,
                           R_dlPFC_enc = all_data[[8]][["SVM_2"]]$encoding, 
                           R_dlPFC_delay = all_data[[8]][["SVM_2"]]$delay,
                           R_IPS_enc = all_data[[9]][["SVM_2"]]$encoding, 
                           R_IPS_delay = all_data[[9]][["SVM_2"]]$delay,
                           R_medPar_enc = all_data[[10]][["SVM_2"]]$encoding, 
                           R_medPar_delay = all_data[[10]][["SVM_2"]]$delay,
                           all_delay_enc = all_data[[11]][["SVM_2"]]$encoding, 
                           all_delay_delay = all_data[[11]][["SVM_2"]]$delay,
                           L_aMFG_enc_delay = all_data[[3]][["SVM_2"]]$encoding_delay, 
                           L_dlPFC_enc_delay = all_data[[4]][["SVM_2"]]$encoding_delay, 
                           L_dMFG_enc_delay = all_data[[5]][["SVM_2"]]$encoding_delay,
                           L_IPS_enc_delay = all_data[[6]][["SVM_2"]]$encoding_delay, 
                           L_preSMA_enc_delay = all_data[[7]][["SVM_2"]]$encoding_delay, 
                           R_dlPFC_enc_delay = all_data[[8]][["SVM_2"]]$encoding_delay, 
                           R_IPS_enc_delay = all_data[[9]][["SVM_2"]]$encoding_delay,
                           R_medPar_enc_delay = all_data[[10]][["SVM_2"]]$encoding_delay,
                           L_aMFG_enc_delay_comb = all_data[[3]][["SVM_2"]]$encoding_delay_comb, 
                           L_dlPFC_enc_delay_comb = all_data[[4]][["SVM_2"]]$encoding_delay_comb, 
                           L_dMFG_enc_delay_comb = all_data[[5]][["SVM_2"]]$encoding_delay_comb,
                           L_IPS_enc_delay_comb = all_data[[6]][["SVM_2"]]$encoding_delay_comb, 
                           L_preSMA_enc_delay_comb = all_data[[7]][["SVM_2"]]$encoding_delay_comb, 
                           R_dlPFC_enc_delay_comb = all_data[[8]][["SVM_2"]]$encoding_delay_comb, 
                           R_IPS_enc_delay_comb = all_data[[9]][["SVM_2"]]$encoding_delay_comb,
                           R_medPar_enc_delay_comb = all_data[[10]][["SVM_2"]]$encoding_delay_comb, 
                           R_all_delay_enc_delay_comb = all_data[[11]][["SVM_2"]]$encoding_delay_comb, 
                           high_corr_fus_enc = similarity_fusiform[["high_correct_avg"]]$X6, 
                           high_corr_fus_delay = similarity_fusiform[["high_correct_avg"]]$X8, 
                           high_incorr_fus_enc = similarity_fusiform[["high_incorrect_avg"]]$X6,
                           high_incorr_fus_del = similarity_fusiform[["high_incorrect_avg"]]$X8,
                           low_corr_fus_enc = similarity_fusiform[["low_correct_avg"]]$X6,
                           low_corr_fus_del = similarity_fusiform[["low_correct_avg"]]$X8, 
                           correct_encoding_to_correct_delay_fus = similarity_fusiform[["correct_encoding_to_correct_delay"]],
                           correct_enc_to_delay_fus_high_corr = similarity_fusiform[["correct_encoding_to_delay_avg"]][,4],
                           enc_to_correct_delay_fus_high_corr = similarity_fusiform[["encoding_to_correct_delay_avg"]][,4],
                           high_corr_DFR_enc = similarity_DFR[["high_correct_avg"]]$X6, 
                           high_corr_DFR_delay = similarity_DFR[["high_correct_avg"]]$X8, 
                           high_incorr_DFR_enc = similarity_DFR[["high_incorrect_avg"]]$X6,
                           high_incorr_DFR_del = similarity_DFR[["high_incorrect_avg"]]$X8,
                           low_corr_DFR_enc = similarity_DFR[["low_correct_avg"]]$X6,
                           low_corr_DFR_del = similarity_DFR[["low_correct_avg"]]$X8, 
                           correct_encoding_to_correct_delay_DFR = similarity_DFR[["correct_encoding_to_correct_delay"]],
                           correct_enc_to_delay_DFR_high_corr = similarity_DFR[["correct_encoding_to_delay_avg"]][,4],
                           enc_to_correct_delay_DFR_high_corr = similarity_DFR[["encoding_to_correct_delay_avg"]][,4],
                           indiv_trial_avgs_high_correct_fusiform_enc = individual_trial_probs_fusiform[["high_correct"]]$V6,
                           indiv_trial_avgs_high_correct_fusiform_delay = individual_trial_probs_fusiform[["high_correct"]]$V8,
                           indiv_trial_avgs_high_correct_fusiform_probe = individual_trial_probs_fusiform[["high_correct"]]$V11,
                           indiv_trial_avgs_high_correct_DFR_enc = individual_trial_probs_DFR[["high_correct"]]$V6,
                           indiv_trial_avgs_high_correct_DFR_delay = individual_trial_probs_DFR[["high_correct"]]$V8,
                           indiv_trial_avgs_high_correct_DFR_probe = individual_trial_probs_DFR[["high_correct"]]$V11,
                           indiv_trial_avgs_high_incorrect_fusiform_enc = individual_trial_probs_fusiform[["high_incorrect"]]$V6,
                           indiv_trial_avgs_high_incorrect_fusiform_delay = individual_trial_probs_fusiform[["high_incorrect"]]$V8,
                           indiv_trial_avgs_high_incorrect_fusiform_probe = individual_trial_probs_fusiform[["high_incorrect"]]$V11,
                           indiv_trial_avgs_high_incorrect_DFR_enc = individual_trial_probs_DFR[["high_incorrect"]]$V6,
                           indiv_trial_avgs_high_incorrect_DFR_delay = individual_trial_probs_DFR[["high_incorrect"]]$V8,
                           indiv_trial_avgs_high_incorrect_DFR_probe = individual_trial_probs_DFR[["high_incorrect"]]$V11,
                           indiv_trial_avgs_low_correct_fusiform_enc = individual_trial_probs_fusiform[["low_correct"]]$V6,
                           indiv_trial_avgs_low_correct_fusiform_delay = individual_trial_probs_fusiform[["low_correct"]]$V8,
                           indiv_trial_avgs_low_correct_fusiform_probe = individual_trial_probs_fusiform[["low_correct"]]$V11,
                           indiv_trial_avgs_low_correct_DFR_enc = individual_trial_probs_DFR[["low_correct"]]$V6,
                           indiv_trial_avgs_low_correct_DFR_delay = individual_trial_probs_DFR[["low_correct"]]$V8,
                           indiv_trial_avgs_low_correct_DFR_probe = individual_trial_probs_DFR[["low_correct"]]$V11,
                           averages_from_template_high_correct_fusiform_enc = averages_from_template_fusiform[["high_correct"]]$V6,
                           averages_from_template_high_correct_fusiform_delay = averages_from_template_fusiform[["high_correct"]]$V8,
                           averages_from_template_high_correct_fusiform_probe = averages_from_template_fusiform[["high_correct"]]$V11,
                           averages_from_template_high_correct_DFR_enc = averages_from_template_DFR[["high_correct"]]$V6,
                           averages_from_template_high_correct_DFR_delay = averages_from_template_DFR[["high_correct"]]$V8,
                           averages_from_template_high_correct_DFR_probe = averages_from_template_DFR[["high_correct"]]$V11,
                           averages_from_template_high_incorrect_fusiform_enc = averages_from_template_fusiform[["high_incorrect"]]$V6,
                           averages_from_template_high_incorrect_fusiform_delay = averages_from_template_fusiform[["high_incorrect"]]$V8,
                           averages_from_template_high_incorrect_fusiform_probe = averages_from_template_fusiform[["high_incorrect"]]$V11,
                           averages_from_template_high_incorrect_DFR_enc = averages_from_template_DFR[["high_incorrect"]]$V6,
                           averages_from_template_high_incorrect_DFR_delay = averages_from_template_DFR[["high_incorrect"]]$V8,
                           averages_from_template_high_incorrect_DFR_probe = averages_from_template_DFR[["high_incorrect"]]$V11,
                           averages_from_template_low_correct_fusiform_enc = averages_from_template_fusiform[["low_correct"]]$V6,
                           averages_from_template_low_correct_fusiform_delay = averages_from_template_fusiform[["low_correct"]]$V8,
                           averages_from_template_low_correct_fusiform_probe = averages_from_template_fusiform[["low_correct"]]$V11,
                           averages_from_template_low_correct_DFR_enc = averages_from_template_DFR[["low_correct"]]$V6,
                           averages_from_template_low_correct_DFR_delay = averages_from_template_DFR[["low_correct"]]$V8,
                           averages_from_template_low_correct_DFR_probe = averages_from_template_DFR[["low_correct"]]$V11, 
                           PTID = constructs_fMRI$PTID
)

data_for_reg <- merge(data_for_reg, p200_FFA, all = TRUE)
data_for_reg <- data_for_reg[,c(2:111,1)]
name_list <- c("alpha_cue_Favg", "alpha_cue_Oz", "alpha_delay_Favg", "alpha_delay_Oz",  "alpha_probe_Favg", "alpha_probe_Oz", "beta_cue_Favg", "beta_cue_Oz", "beta_delay_Favg", "beta_delay_Oz", "beta_probe_Favg", "beta_probe_Oz",  "theta_cue_Favg", "theta_cue_Oz",  "theta_delay_Favg", "theta_delay_Oz", "theta_probe_Favg", "theta_probe_Oz", "low_gamma_cue_Favg", "low_gamma_cue_Oz",  "low_gamma_delay_Favg", "low_gamma_delay_Oz",  "low_gamma_probe_Favg", "low_gamma_probe_Oz", "cue_O_n170", "probe_O_n170", "cue_P3", "probe_P3")

EEG_list <- list(alpha_cue_average_Favg, alpha_cue_average_Oz,alpha_delay_average_Favg, alpha_delay_average_Oz, alpha_probe_average_Favg, alpha_probe_average_Oz,beta_cue_average_Favg, beta_cue_average_Oz, beta_delay_average_Favg, beta_delay_average_Oz, beta_probe_average_Favg, beta_probe_average_Oz,theta_cue_average_Favg, theta_cue_average_Oz, theta_delay_average_Favg, theta_delay_average_Oz, theta_probe_average_Favg, theta_probe_average_Oz,low_gamma_cue_average_Favg, low_gamma_cue_average_Oz,low_gamma_delay_average_Favg, low_gamma_delay_average_Oz,  low_gamma_probe_average_Favg, low_gamma_probe_average_Oz,cue_average_O_n170, probe_average_O_n170, cue_average_P3, probe_average_P3 )

for (idx in seq.int(1,length(EEG_list))){
  colnames(EEG_list[[idx]])[2:3] <- paste0(name_list[idx],"_",colnames(EEG_list[[idx]][2:3]),sep="") 
  EEG_list[[idx]] <- EEG_list[[idx]][1:3]
}

EEG_df <- Reduce(function(x,y) merge(x = x, y = y, all=TRUE), EEG_list)
EEG_df <- merge(EEG_df, CDA[,1:10], all.x=TRUE, by.x="PTID", by.y="subID")
colnames(aparc_LH_MTHICK)[2:37] <- paste0(colnames(aparc_LH_MTHICK)[2:37],"_LH", sep="")
colnames(aparc_RH_MTHICK)[2:37] <- paste0(colnames(aparc_RH_MTHICK)[2:37],"_RH", sep="")
colnames(FA_Data)[1] <- "ID"

struc_df <- Reduce(function(x,y) merge(x = x, y = y, by="ID"),
                   list(aparc_LH_MTHICK, aparc_RH_MTHICK,aseg))

# remove STD from FA 
FA_Data <- FA_Data[,1:21]
FA_Data <- data.frame(FA_Data)
RS_df <- Reduce(function(x,y) merge(x=x, y=y, by = "PTID"), 
                list(p200_BCT_forCorr,p200_indiv_network_ParticCoeff, p200_all_RS))
RS_df <- data.frame(RS_df)

From these plots, we can see that our measures are high correlated.

Within encoding load effects

pairs.panels(data_for_reg[,c(4,6,8,10,12,14,16,18)], density=TRUE)

Within delay load effects

pairs.panels(data_for_reg[,c(5,7,9,11,13,15,17,19)], density=TRUE)

Encoding - Delay

pairs.panels(data_for_reg[,c(22:29)], density= TRUE)

Within TR Similarity

pairs.panels(data_for_reg[,c(39:44,49:53)])

Across TR Similarity

pairs.panels(data_for_reg[,c(45:47,53:56)])

Within Inidivdual MPVA trials - fusiform

pairs.panels(data_for_reg[,c(57:59,63:65,69:71)])

Within Inidivdual MPVA trials - delay

pairs.panels(data_for_reg[,c(60:62,66:68,72:74)])

Within MPVA templates - fusiform

pairs.panels(data_for_reg[,c(75:77,81:83,87:89)])

Within MPVA templates - delay

pairs.panels(data_for_reg[,c(78:80,84:86,90:92)])

Run PCAs

Univariate effects

We know that our variables are highly correlated, so to deal with multi-collinearity, we’re going to try to run a PCA on all the encoding load effects, delay load effects and encoding - delay. It looks like the first 3 dimensions carry most of the variance (77%), so we’re going to stick with those three.

PC1: encoding load effects and the combined encoding+encoding-delay in the PFC regions and load effect during delay PC2: delay load effect PC3: encoding - delay in parietal regions and fusiform activity during delay

res.pca <- prcomp(data_for_reg[!is.na(data_for_reg$L_CUE_LE),c(4:19, 22:37, 106:110)], scale = TRUE)
fviz_eig(res.pca)

summary(res.pca)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     3.8317 2.8690 1.7755 1.36044 1.16149 1.08876 0.99447
## Proportion of Variance 0.3968 0.2225 0.0852 0.05002 0.03646 0.03204 0.02673
## Cumulative Proportion  0.3968 0.6193 0.7045 0.75450 0.79096 0.82300 0.84973
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.90027 0.80958 0.78333 0.76012 0.71656 0.65841 0.61265
## Proportion of Variance 0.02191 0.01771 0.01658 0.01562 0.01388 0.01172 0.01014
## Cumulative Proportion  0.87163 0.88935 0.90593 0.92155 0.93542 0.94714 0.95729
##                           PC15    PC16    PC17    PC18    PC19   PC20   PC21
## Standard deviation     0.54920 0.53414 0.52176 0.47660 0.45344 0.3894 0.3700
## Proportion of Variance 0.00815 0.00771 0.00736 0.00614 0.00556 0.0041 0.0037
## Cumulative Proportion  0.96544 0.97315 0.98051 0.98664 0.99220 0.9963 1.0000
##                             PC22      PC23      PC24      PC25      PC26
## Standard deviation     1.001e-15 3.286e-16 3.286e-16 3.286e-16 3.286e-16
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion  1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
##                             PC27      PC28      PC29      PC30      PC31
## Standard deviation     3.286e-16 3.286e-16 3.286e-16 3.286e-16 3.286e-16
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion  1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
##                             PC32      PC33      PC34      PC35      PC36
## Standard deviation     3.286e-16 3.286e-16 3.286e-16 3.286e-16 3.286e-16
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion  1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
##                             PC37
## Standard deviation     2.265e-16
## Proportion of Variance 0.000e+00
## Cumulative Proportion  1.000e+00
res.var <- get_pca_var(res.pca)
res.var$contrib        # Contributions to the PCs
##                              Dim.1      Dim.2        Dim.3        Dim.4
## L_aMFG_enc              5.28856171 0.37902778 6.157961e-01  0.030928862
## L_aMFG_delay            0.87514835 7.50434194 6.055440e-01  0.039678359
## L_dlPFC_enc             5.06817204 0.14920941 1.474970e+00  0.853601769
## L_dlPFC_delay           0.90616054 7.41643706 3.362529e-02  0.518352425
## L_dMFG_enc              3.85523406 0.83486192 7.684441e-01  0.104593574
## L_dMFG_delay            0.13088997 7.22260516 3.919021e-02  1.214790066
## L_IPS_enc               4.72681474 1.17282759 7.032161e-01  0.240384223
## L_IPS_delay             1.09744544 7.85264082 9.672027e-02  0.195400514
## L_preSMA_enc            5.05333421 0.26759021 1.130414e+00  1.019022415
## L_preSMA_delay          0.89809066 6.88015495 5.319551e-01  0.008823488
## R_dlPFC_enc             4.73935292 0.46232139 6.660689e-01  1.053421889
## R_dlPFC_delay           0.49515939 7.85774530 8.012236e-04  2.625527821
## R_IPS_enc               4.66486401 0.83847976 2.642182e+00  1.514621778
## R_IPS_delay             1.36497712 7.09776775 2.901157e-01  0.292800119
## R_medPar_enc            3.31551649 0.76575484 6.344154e+00  2.466090678
## R_medPar_delay          0.96321486 5.64743173 1.919158e-01  2.370647978
## L_aMFG_enc_delay        2.36929567 4.42896532 2.672597e+00  0.153234907
## L_dlPFC_enc_delay       2.59163366 4.78890802 2.369753e+00  0.118092733
## L_dMFG_enc_delay        2.48207674 3.53125606 1.158124e+00  0.675105861
## L_IPS_enc_delay         2.61058199 3.35530941 2.102327e+00  1.319237730
## L_preSMA_enc_delay      2.80635222 3.28843831 3.493193e+00  1.114055576
## R_dlPFC_enc_delay       3.06685615 3.87068237 7.844988e-01  0.183640832
## R_IPS_enc_delay         1.78102740 3.71978587 6.944083e+00  0.816801621
## R_medPar_enc_delay      1.10671832 2.64765832 1.201193e+01  0.017176427
## L_aMFG_enc_delay_comb   4.88816300 0.62517072 1.860527e+00  0.102057952
## L_dlPFC_enc_delay_comb  4.76316585 0.85743222 2.345169e+00  0.530553472
## L_dMFG_enc_delay_comb   4.28945870 0.32556806 1.307867e+00  0.086371746
## L_IPS_enc_delay_comb    4.87986967 0.04436567 1.595501e+00  0.788727952
## L_preSMA_enc_delay_comb 4.77915244 0.40562367 2.534902e+00  1.297075970
## R_dlPFC_enc_delay_comb  4.92923812 0.40923925 9.098191e-01  0.145731304
## R_IPS_enc_delay_comb    4.29112476 0.16765401 5.764064e+00  1.573412492
## R_medPar_enc_delay_comb 2.90033053 0.09957716 1.172958e+01  1.115201299
## L_DELAY_LE              0.05009231 2.05525124 8.483684e+00  0.054992419
## L_PROBE_LE              0.34549630 0.26018331 1.996965e+00 36.768046879
## R_CUE_LE                1.28086178 0.12786195 2.991150e+00  4.079700415
## R_DELAY_LE              0.01481527 2.47568695 7.542708e+00  0.200204539
## R_PROBE_LE              0.33075262 0.16618447 3.266446e+00 34.311891921
##                                Dim.5        Dim.6        Dim.7        Dim.8
## L_aMFG_enc              7.235748e-01 3.333941e-01  0.076352007 1.232438e-01
## L_aMFG_delay            3.509194e+00 3.505166e+00  0.191677299 8.052987e-01
## L_dlPFC_enc             2.247951e+00 1.214441e+00  0.178245671 6.168010e-01
## L_dlPFC_delay           3.096310e+00 1.564725e+00  0.041271626 3.638735e-01
## L_dMFG_enc              7.555407e-01 1.962421e+00 21.386992431 2.233940e-04
## L_dMFG_delay            5.663422e-02 7.538875e+00  0.880083005 2.609825e-01
## L_IPS_enc               2.746733e+00 2.718373e+00  0.903965109 3.606078e+00
## L_IPS_delay             1.107531e-01 8.913786e-04  1.010416142 2.087395e-02
## L_preSMA_enc            3.439487e+00 1.812855e+00  0.431125808 1.366172e+00
## L_preSMA_delay          4.472804e+00 1.024015e+00  0.030826740 1.029907e+00
## R_dlPFC_enc             1.553245e-01 4.610671e+00  6.130948222 4.560212e-02
## R_dlPFC_delay           1.955810e-01 6.977731e-01  3.109724574 2.896851e-01
## R_IPS_enc               2.813481e+00 3.283540e-02  1.481955541 6.846243e-02
## R_IPS_delay             2.573756e-02 3.684334e+00  2.006973302 1.436737e+00
## R_medPar_enc            2.645726e-02 8.054388e+00  1.432804500 4.746231e+00
## R_medPar_delay          7.548182e-01 1.466718e+01  0.452721314 2.901865e+00
## L_aMFG_enc_delay        9.503316e-01 1.610421e+00  0.549479290 1.655631e+00
## L_dlPFC_enc_delay       1.337411e-04 1.048503e-03  0.448037126 9.034739e-02
## L_dMFG_enc_delay        1.232796e+00 2.118649e+00 13.115449973 2.965985e-01
## L_IPS_enc_delay         3.133557e+00 4.745080e+00  0.005486302 6.949513e+00
## L_preSMA_enc_delay      2.663738e-02 3.565503e-01  0.815290534 1.549722e-01
## R_dlPFC_enc_delay       1.121652e-04 2.527492e+00  1.111881389 8.115718e-02
## R_IPS_enc_delay         3.592793e+00 3.932094e+00  0.014352583 1.111219e+00
## R_medPar_enc_delay      6.094053e-01 8.755190e-01  0.436380784 2.007792e+01
## L_aMFG_enc_delay_comb   9.338731e-04 1.255883e-01  0.326366141 8.378011e-01
## L_dlPFC_enc_delay_comb  7.773512e-01 4.483822e-01  0.363823456 3.889196e-01
## L_dMFG_enc_delay_comb   1.344132e+00 1.406865e-03 23.287538758 1.079815e-01
## L_IPS_enc_delay_comb    3.809198e+00 4.618671e+00  0.424499095 6.444259e+00
## L_preSMA_enc_delay_comb 1.372867e+00 1.207308e+00  0.727558988 7.906878e-01
## R_dlPFC_enc_delay_comb  5.733618e-02 4.481425e+00  4.113548089 5.927253e-04
## R_IPS_enc_delay_comb    4.211532e+00 8.238112e-01  0.514702925 1.418843e-01
## R_medPar_enc_delay_comb 9.791675e-02 1.651287e+00  1.216770171 1.406914e+01
## L_DELAY_LE              2.494714e+01 2.679230e-01  2.602743312 2.265574e+00
## L_PROBE_LE              1.866535e+00 3.968173e-01  0.547784215 1.788630e+00
## R_CUE_LE                2.880839e-01 1.493516e+01  6.995797414 1.730693e+01
## R_DELAY_LE              2.373783e+01 4.439149e-01  2.612406526 7.324654e+00
## R_PROBE_LE              2.812990e+00 1.009115e+00  0.024019640 4.335506e-01
##                                Dim.9       Dim.10       Dim.11       Dim.12
## L_aMFG_enc              4.7403682128  3.435853901  0.645747346 1.258687e+01
## L_aMFG_delay            0.1022843459  0.275358333  4.452872554 7.565803e-01
## L_dlPFC_enc             6.1339999504  0.001158843  4.795202808 1.266258e-01
## L_dlPFC_delay           0.4750592954  0.058596715  4.619899062 5.614144e-01
## L_dMFG_enc              0.6115759785  0.488957205 12.883977103 2.401838e-02
## L_dMFG_delay            8.2619717943  5.383452106 24.885901479 4.646307e-01
## L_IPS_enc               1.8015282819  0.122223628  1.295426406 1.163041e-02
## L_IPS_delay             0.7862711121  0.133244429  5.303409596 6.151801e-01
## L_preSMA_enc            1.4003851037  0.191712331  0.007564968 3.940104e+00
## L_preSMA_delay          0.0942600343  4.455330527  0.113601695 6.849629e+00
## R_dlPFC_enc             5.0519867188  2.015592989  2.943127200 3.239123e-02
## R_dlPFC_delay           0.2780421437  0.336534587  2.515110290 5.714481e+00
## R_IPS_enc               0.3657507564  4.427474332  0.086977127 9.519518e-01
## R_IPS_delay             1.7802120684  1.901354683  0.006643663 7.082740e-01
## R_medPar_enc            0.0073671973  2.814297534  0.079022550 1.460115e-01
## R_medPar_delay          1.5769303534  1.537065589  3.522246932 3.768368e-02
## L_aMFG_enc_delay        7.1619046996  2.168272245  1.600779663 8.728553e+00
## L_dlPFC_enc_delay       4.4762643940  0.073575206  0.150571423 1.037054e-01
## L_dMFG_enc_delay        4.8565707177  9.614501685  2.582926611 7.410024e-01
## L_IPS_enc_delay         0.5184295088  0.768447518  1.398669226 5.876767e-01
## L_preSMA_enc_delay      1.1178755869  5.880622444  0.044029170 2.150467e+01
## R_dlPFC_enc_delay       9.1711846470  1.048735441  0.141530671 4.516625e+00
## R_IPS_enc_delay         0.6156373361  1.047192532  0.211103016 5.853498e-02
## R_medPar_enc_delay      1.7021435003  0.352241279  3.130491863 5.476547e-02
## L_aMFG_enc_delay_comb   7.6332142429  3.645881995  0.047530351 1.390815e+01
## L_dlPFC_enc_delay_comb  6.6712276236  0.026661043  2.254121456 1.663086e-03
## L_dMFG_enc_delay_comb   0.7023760846  4.969722200  1.323711477 3.552705e-01
## L_IPS_enc_delay_comb    1.5025960407  0.438129185  0.021847291 9.700450e-02
## L_preSMA_enc_delay_comb 1.5419930981  2.291628063  0.003340600 1.272759e+01
## R_dlPFC_enc_delay_comb  8.6393182886  1.919118071  1.481916154 1.040484e+00
## R_IPS_enc_delay_comb    0.0003204776  3.516387721  0.181450002 5.616918e-01
## R_medPar_enc_delay_comb 0.4093412217  1.874856079  0.584018209 1.328134e-01
## L_DELAY_LE              1.5050236585  3.965963478  8.009741706 7.716488e-04
## L_PROBE_LE              0.3603784552  0.363208307  2.056479269 5.406233e-01
## R_CUE_LE                4.7721446400 15.904188758  6.070594026 3.560886e-01
## R_DELAY_LE              1.5194951649  4.712187595  0.497032249 5.162128e-04
## R_PROBE_LE              1.6545672660  7.840271421  0.051384787 4.543332e-01
##                               Dim.13       Dim.14       Dim.15       Dim.16
## L_aMFG_enc               0.004942722 6.111994e-01  0.291138336  2.905130844
## L_aMFG_delay             0.181753977 3.351148e-01  4.950751265  1.767548528
## L_dlPFC_enc              0.015390615 1.033148e+01  2.453533077  0.005971895
## L_dlPFC_delay            6.425510247 8.386005e-04 21.251019408  1.861595358
## L_dMFG_enc               1.524325953 1.930649e-01  0.007267323  1.358969381
## L_dMFG_delay             3.975753296 2.752255e-01  0.172521182  0.026159430
## L_IPS_enc                4.979903324 2.298592e+00  0.823857605  0.033922025
## L_IPS_delay             17.389288373 2.053338e+00  2.671671575  0.379212169
## L_preSMA_enc             3.101910349 3.816055e+00  0.052787959 12.487445744
## L_preSMA_delay           6.937524297 2.110274e+00  2.203359389 24.912742307
## R_dlPFC_enc              4.878489724 1.163335e+00  0.460779110  3.864602622
## R_dlPFC_delay           14.078496871 5.076940e+00  2.082044401  9.574100241
## R_IPS_enc                1.092678372 7.211521e-02  1.362776200  0.385554152
## R_IPS_delay              6.156503926 4.590853e-01  3.272504828  2.115890483
## R_medPar_enc             0.448502986 2.437865e+00  0.013030811  4.198369891
## R_medPar_delay           0.564597503 5.932819e+00  1.386958255 15.133143705
## L_aMFG_enc_delay         0.257080608 6.734976e-02  2.777688648  0.247644894
## L_dlPFC_enc_delay        5.295426102 1.280982e+01  7.184248530  1.960774257
## L_dMFG_enc_delay         0.707483929 1.155771e-02  0.119735913  0.980893750
## L_IPS_enc_delay          3.685161471 9.897426e-02  9.424875883  0.223249731
## L_preSMA_enc_delay       0.173809325 7.752942e-01  1.204344655  0.322789336
## R_dlPFC_enc_delay        1.393031532 1.155976e+01  0.415738729  0.658849605
## R_IPS_enc_delay          2.425905071 1.237197e+00 12.448333226  0.815456690
## R_medPar_enc_delay       0.002523359 7.794713e-01  1.401378711  3.745491697
## L_aMFG_enc_delay_comb    0.102586967 3.680977e-01  0.354366156  1.651974642
## L_dlPFC_enc_delay_comb   1.301360680 1.437740e+01  0.236988180  0.615630974
## L_dMFG_enc_delay_comb    0.051253661 3.734515e-02  0.023562446  1.592558876
## L_IPS_enc_delay_comb     0.238394024 1.291970e+00  4.466231543  0.013423942
## L_preSMA_enc_delay_comb  0.666228469 2.564860e+00  0.188574185  3.157054979
## R_dlPFC_enc_delay_comb   0.468800965 5.966504e+00  0.003706432  0.541712950
## R_IPS_enc_delay_comb     0.018194213 5.566510e-01  6.597777736  0.004716956
## R_medPar_enc_delay_comb  0.155178364 2.613717e-01  0.311079446  0.066520739
## L_DELAY_LE               0.888118260 4.903343e-01  2.050408051  1.374975915
## L_PROBE_LE               3.179673732 4.594869e+00  5.356561963  0.294915270
## R_CUE_LE                 1.965964339 4.921954e+00  0.352585656  0.324648545
## R_DELAY_LE               4.677800799 6.132269e-02  0.356659522  0.386063646
## R_PROBE_LE               0.590451596 5.498393e-04  1.269153663  0.010293830
##                               Dim.17       Dim.18       Dim.19       Dim.20
## L_aMFG_enc               0.819956252  6.712597431 2.771298e+00  0.998570027
## L_aMFG_delay             0.273219736 29.775191172 4.065740e+00  3.853551882
## L_dlPFC_enc              1.284275486  4.502267555 2.619560e-01  0.059085349
## L_dlPFC_delay           11.099093905  7.404365417 2.038516e-01  2.884395109
## L_dMFG_enc               0.134104230  0.292675715 3.820262e-03  0.023880504
## L_dMFG_delay             1.145062169  1.353966336 1.469560e+00  0.014215902
## L_IPS_enc                1.838011228  0.506583016 2.878481e+00  5.116167037
## L_IPS_delay              0.538124090  0.374162829 4.493794e+00 11.833215959
## L_preSMA_enc             1.298700757  0.042171262 2.843870e-02  0.041547328
## L_preSMA_delay           5.621474969  0.381659055 2.472948e+00  1.641858268
## R_dlPFC_enc              0.312594419  0.505476602 3.239705e-01  2.625838621
## R_dlPFC_delay            1.456635766  0.587259284 1.875881e+00 10.988572520
## R_IPS_enc                6.148053628  0.126717662 2.725024e-01  9.149755841
## R_IPS_delay              1.052214499  6.006016181 5.780177e-01 13.016592430
## R_medPar_enc             4.608332345  0.008462164 6.838228e-01  0.100805581
## R_medPar_delay           5.430198824  1.331317870 2.198332e-01  0.393882881
## L_aMFG_enc_delay         2.271162545  7.384452807 5.749725e-02  0.818967758
## L_dlPFC_enc_delay        3.746795099  0.056930874 1.950862e-02  1.855828013
## L_dMFG_enc_delay         0.556289874  0.447806337 1.738432e+00  0.076819149
## L_IPS_enc_delay          6.816998649  2.665622283 5.997146e-02  1.073549088
## L_preSMA_enc_delay       0.773212073  0.635289193 2.652054e+00  0.885688106
## R_dlPFC_enc_delay        3.241133355  2.376054280 4.845638e-01  1.996637779
## R_IPS_enc_delay          3.590798857  5.651312117 5.157355e-02  0.158866669
## R_medPar_enc_delay       0.005381931  1.400098416 2.041656e-01  1.161573191
## L_aMFG_enc_delay_comb    1.864661210  0.002046960 7.337314e-01  0.009232233
## L_dlPFC_enc_delay_comb   0.122736111  1.278020402 1.419456e-01  0.330497238
## L_dMFG_enc_delay_comb    0.050354381  0.005986440 6.569175e-01  0.063993351
## L_IPS_enc_delay_comb     4.729655312  1.617959816 9.509130e-01  0.897181380
## L_preSMA_enc_delay_comb  0.045000619  0.284704030 8.894349e-01  0.135726040
## R_dlPFC_enc_delay_comb   1.654498756  1.535427103 9.844379e-04  0.045045203
## R_IPS_enc_delay_comb     6.581460438  0.998618246 4.705840e-02  2.994843237
## R_medPar_enc_delay_comb  1.714325433  0.326472279 5.770632e-01  0.596728250
## L_DELAY_LE               0.036919945  3.692173297 2.187440e+01  2.484354457
## L_PROBE_LE               0.978943492  0.610263014 1.534934e+01  7.624649298
## R_CUE_LE                16.104849513  1.000994101 5.369060e-02  0.113091628
## R_DELAY_LE               0.289977858  7.795232109 1.465331e+01  3.084229090
## R_PROBE_LE               1.764792245  0.323646347 1.619954e+01 10.850563603
##                               Dim.21       Dim.22       Dim.23       Dim.24
## L_aMFG_enc               0.144577003 2.491759e+01 0.000000e+00 0.000000e+00
## L_aMFG_delay             0.075646201 1.200522e+01 4.045232e-01 2.984372e-01
## L_dlPFC_enc              0.474113707 1.045908e+00 2.845963e+00 9.993705e+00
## L_dlPFC_delay            0.017973076 7.575030e-01 9.521463e-01 7.563448e+00
## L_dMFG_enc               0.181426138 5.075256e-01 4.313550e+00 1.761025e+00
## L_dMFG_delay             0.111615408 6.516712e-01 3.863739e+00 7.559816e-02
## L_IPS_enc                1.957235908 7.519844e+00 1.289790e+00 7.704619e-01
## L_IPS_delay              9.258832739 1.141002e+00 7.800398e-01 6.043188e+00
## L_preSMA_enc             0.279280837 3.954356e+00 1.759697e+00 1.062290e+01
## L_preSMA_delay           0.374055540 8.970408e-01 2.374211e-01 5.470634e+00
## R_dlPFC_enc              0.466731299 1.190213e+00 1.331297e+01 1.821224e+00
## R_dlPFC_delay            0.455025389 7.506348e-01 3.762975e+00 1.847291e-01
## R_IPS_enc                3.340854305 1.778964e-01 9.778525e-01 2.886026e+00
## R_IPS_delay             12.069030627 3.032558e-02 8.015839e+00 9.008005e-01
## R_medPar_enc             0.227226864 3.801884e-01 2.552165e-01 5.437817e-02
## R_medPar_delay           0.377130643 2.707326e-01 5.745155e-02 1.449692e-02
## L_aMFG_enc_delay         0.474699835 4.992457e+00 1.582767e+00 1.167687e+00
## L_dlPFC_enc_delay        0.414240844 7.395274e+00 2.743363e-01 8.387683e+00
## L_dMFG_enc_delay         0.005884252 7.053250e-01 2.891606e+00 3.482193e+00
## L_IPS_enc_delay          2.886477128 6.823045e-02 4.281912e-01 1.279774e+01
## L_preSMA_enc_delay       0.001468519 9.796320e-02 5.013371e+00 4.988863e+00
## R_dlPFC_enc_delay        0.011654823 6.486993e-01 5.264503e-01 1.041924e-01
## R_IPS_enc_delay          2.989771940 1.392165e-03 1.702240e+01 7.864644e-02
## R_medPar_enc_delay       0.015505696 2.106519e+00 2.204927e-07 3.117851e-04
## L_aMFG_enc_delay_comb    0.365121335 4.711160e+00 1.311019e+00 9.672050e-01
## L_dlPFC_enc_delay_comb   0.558483778 1.183691e+01 8.615898e-01 5.610230e-03
## L_dMFG_enc_delay_comb    0.086202313 1.091623e-02 1.102825e-01 7.428880e+00
## L_IPS_enc_delay_comb     0.003567181 5.680499e+00 4.852669e-02 8.517459e+00
## L_preSMA_enc_delay_comb  0.108658881 1.944313e+00 1.083106e+01 3.918407e-01
## R_dlPFC_enc_delay_comb   0.213982551 2.555272e-02 5.714920e+00 2.078399e+00
## R_IPS_enc_delay_comb     0.076977051 1.336515e-01 1.039293e+01 1.113664e+00
## R_medPar_enc_delay_comb  0.054378258 3.443476e+00 1.613786e-01 2.857113e-02
## L_DELAY_LE              12.899412631 2.530792e-31 4.814825e-31 7.225247e-31
## L_PROBE_LE              14.719639212 1.203706e-31 4.627047e-30 2.023430e-30
## R_CUE_LE                 0.053659280 4.814825e-33 1.012317e-30 1.560003e-30
## R_DELAY_LE              17.613953448 1.083336e-30 2.708339e-31 2.330375e-30
## R_PROBE_LE              16.635505357 1.925930e-32 8.137054e-31 4.814825e-33
##                               Dim.25       Dim.26       Dim.27       Dim.28
## L_aMFG_enc              0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## L_aMFG_delay            7.166360e-01 8.910139e-01 6.073838e-02 3.034881e-01
## L_dlPFC_enc             3.806208e+00 1.273468e+00 1.500117e+00 7.174468e+00
## L_dlPFC_delay           8.066024e-05 6.077014e-02 1.621906e-01 3.335568e+00
## L_dMFG_enc              2.305233e-01 6.726809e-02 3.681434e-02 3.008419e-03
## L_dMFG_delay            4.941686e+00 4.579809e-01 3.928729e-01 3.889499e-01
## L_IPS_enc               5.379813e+00 7.804933e-03 1.745099e-01 2.079653e+00
## L_IPS_delay             9.413069e-01 5.468281e-02 7.543571e-01 2.469022e+00
## L_preSMA_enc            7.697369e+00 3.927005e+00 1.461034e+00 7.817103e-01
## L_preSMA_delay          1.873093e+00 1.425819e+00 3.080841e-01 1.035822e+00
## R_dlPFC_enc             1.052322e-03 3.407421e-04 8.334136e+00 8.670113e-02
## R_dlPFC_delay           3.683579e-02 9.533498e-05 5.393603e+00 1.852654e+00
## R_IPS_enc               4.051594e+00 6.056658e-01 1.116070e+01 7.408948e+00
## R_IPS_delay             4.991417e+00 4.595268e+00 4.988216e+00 1.856445e+00
## R_medPar_enc            3.093183e+00 4.858458e-01 2.387531e+01 6.775718e+00
## R_medPar_delay          3.531411e-02 4.515128e-01 1.480371e+01 1.360775e+01
## L_aMFG_enc_delay        2.803962e+00 3.486245e+00 2.376494e-01 1.187449e+00
## L_dlPFC_enc_delay       3.087763e+00 2.303524e+00 6.492829e-02 1.980200e+00
## L_dMFG_enc_delay        1.447126e+01 1.090229e+00 1.965639e+00 1.317025e+00
## L_IPS_enc_delay         1.113415e-02 1.132783e-01 1.395132e+00 2.589006e+00
## L_preSMA_enc_delay      2.915070e-01 1.904942e+01 2.104839e-02 2.085902e+00
## R_dlPFC_enc_delay       1.335176e-01 1.336617e-03 4.807039e+00 6.420336e+00
## R_IPS_enc_delay         5.115662e+00 9.610283e+00 1.415350e+00 3.036904e-02
## R_medPar_enc_delay      1.329882e+00 3.723123e-01 7.486350e+00 1.923286e+01
## L_aMFG_enc_delay_comb   2.322545e+00 2.887687e+00 1.968470e-01 9.835743e-01
## L_dlPFC_enc_delay_comb  1.097285e+01 5.717370e+00 1.627402e+00 8.618490e-01
## L_dMFG_enc_delay_comb   7.991258e+00 4.443522e-01 1.843850e+00 8.647555e-01
## L_IPS_enc_delay_comb    3.609451e+00 7.296358e-02 7.476399e-01 2.527401e-01
## L_preSMA_enc_delay_comb 3.407232e+00 3.480867e+01 7.941199e-01 4.080296e-01
## R_dlPFC_enc_delay_comb  1.006430e-01 2.503249e-03 1.334176e-01 4.585257e+00
## R_IPS_enc_delay_comb    3.825526e-01 5.734929e+00 2.123182e+00 3.830921e+00
## R_medPar_enc_delay_comb 6.172671e+00 3.573775e-04 1.734010e+00 4.209825e+00
## L_DELAY_LE              3.277090e-31 5.484386e-32 4.701977e-32 3.130840e-30
## L_PROBE_LE              3.900008e-31 2.891904e-31 1.647874e-30 1.738152e-30
## R_CUE_LE                1.203706e-31 1.203706e-31 4.478991e-30 2.547042e-30
## R_DELAY_LE              3.009266e-32 3.478711e-31 6.093763e-31 8.902611e-30
## R_PROBE_LE              1.733337e-31 5.825938e-31 3.009266e-32 7.703720e-32
##                               Dim.29       Dim.30       Dim.31       Dim.32
## L_aMFG_enc              0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## L_aMFG_delay            4.435744e-01 7.858842e-01 8.959597e-01 1.859400e-01
## L_dlPFC_enc             1.954221e+01 2.636715e+00 9.655867e-02 1.607464e+00
## L_dlPFC_delay           9.761216e+00 1.256853e-02 1.537981e+00 7.817782e-01
## L_dMFG_enc              9.025525e+00 1.944740e+01 1.285796e+00 1.015507e+00
## L_dMFG_delay            5.154529e+00 2.244502e+00 3.580931e+00 1.091253e+00
## L_IPS_enc               1.384053e+00 9.170298e+00 1.191494e-02 1.542111e+01
## L_IPS_delay             1.520974e+00 5.490685e+00 2.037201e+00 1.474735e+00
## L_preSMA_enc            3.416075e-01 2.439720e+00 6.973070e+00 5.011995e+00
## L_preSMA_delay          6.394295e-01 4.395822e-01 6.338308e+00 1.961774e+00
## R_dlPFC_enc             4.604881e-01 1.369244e+00 1.253289e+00 6.890838e+00
## R_dlPFC_delay           8.835225e-01 7.023752e-01 4.176843e+00 7.928372e-01
## R_IPS_enc               4.008396e+00 7.789292e-01 7.080140e-02 2.689563e+00
## R_IPS_delay             8.412724e-02 1.336861e+00 6.919868e-02 2.130348e-03
## R_medPar_enc            6.446743e-02 1.876001e+00 1.620577e-02 1.461672e+00
## R_medPar_delay          8.375948e-01 1.162606e+00 5.432419e-02 1.084159e+00
## L_aMFG_enc_delay        1.735561e+00 3.074907e+00 3.505596e+00 7.275223e-01
## L_dlPFC_enc_delay       6.508079e+00 1.474067e+00 8.140890e+00 4.998376e-01
## L_dMFG_enc_delay        1.835276e+00 2.400959e+00 6.291471e+00 1.000762e+00
## L_IPS_enc_delay         1.509437e+00 2.973725e+00 6.517631e+00 8.453791e-01
## L_preSMA_enc_delay      1.518343e+00 4.736037e-03 1.005845e+01 1.174450e+00
## R_dlPFC_enc_delay       1.755906e+00 4.576263e-01 1.023511e+01 2.663799e-01
## R_IPS_enc_delay         4.489747e+00 1.687031e+00 4.515973e-01 1.543214e+00
## R_medPar_enc_delay      3.421372e+00 5.874843e-01 2.746722e-01 8.313675e+00
## L_aMFG_enc_delay_comb   1.437580e+00 2.546972e+00 2.903715e+00 6.026129e-01
## L_dlPFC_enc_delay_comb  1.730574e+00 6.358702e+00 8.797607e+00 1.605363e-01
## L_dMFG_enc_delay_comb   2.019292e+00 2.598695e+01 1.353903e+00 1.416074e-04
## L_IPS_enc_delay_comb    1.060189e-01 3.695065e-01 7.030869e+00 1.569347e+01
## L_preSMA_enc_delay_comb 4.759262e-01 1.602739e+00 6.501637e-01 7.472820e-01
## R_dlPFC_enc_delay_comb  4.593310e-01 1.212209e-01 4.299479e+00 7.283939e+00
## R_IPS_enc_delay_comb    1.307646e+01 3.234054e-01 7.374202e-01 6.171262e+00
## R_medPar_enc_delay_comb 3.769383e+00 1.365917e-01 3.530396e-01 1.349677e+01
## L_DELAY_LE              2.437505e-30 3.081488e-31 1.807440e-32 3.510007e-30
## L_PROBE_LE              3.900008e-31 4.814825e-33 4.814825e-33 3.900008e-31
## R_CUE_LE                1.560003e-30 3.254822e-30 5.085659e-30 7.136774e-30
## R_DELAY_LE              3.900008e-31 6.933348e-31 4.701977e-32 1.203706e-33
## R_PROBE_LE              3.510007e-30 3.478711e-31 1.806312e-31 5.403437e-30
##                               Dim.33       Dim.34       Dim.35       Dim.36
## L_aMFG_enc              0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## L_aMFG_delay            4.427554e+00 1.880998e+00 2.321548e-01 3.075290e-01
## L_dlPFC_enc             2.708211e+00 1.036987e-01 2.199278e+00 3.726118e-01
## L_dlPFC_delay           2.026855e+00 1.569075e-01 3.277099e-01 1.146939e+00
## L_dMFG_enc              2.168059e-01 3.554496e+00 5.310239e+00 5.418119e+00
## L_dMFG_delay            3.826961e-03 8.931840e-04 6.130452e+00 5.610559e+00
## L_IPS_enc               3.211794e-03 6.839875e-01 3.365702e+00 6.180774e+00
## L_IPS_delay             2.123239e-01 1.366030e+00 5.902072e+00 2.675705e+00
## L_preSMA_enc            3.036762e+00 3.899460e+00 3.556822e+00 1.337515e-01
## L_preSMA_delay          1.776582e+00 4.242897e-01 3.394650e+00 7.477198e-03
## R_dlPFC_enc             9.302562e-01 1.470640e+01 4.657563e+00 1.515112e+00
## R_dlPFC_delay           1.737174e+00 3.682869e+00 3.972594e-01 4.748050e+00
## R_IPS_enc               1.458469e+00 1.535464e-01 1.596501e+00 1.999135e+01
## R_IPS_delay             2.548810e+00 1.204240e-01 1.457366e+00 3.656490e+00
## R_medPar_enc            1.141213e+01 4.794173e+00 4.660134e-01 1.757839e+00
## R_medPar_delay          4.119013e-01 1.217734e+00 2.874940e-01 9.089988e-02
## L_aMFG_enc_delay        1.732357e+01 7.359727e+00 9.083457e-01 1.203260e+00
## L_dlPFC_enc_delay       2.223300e+00 2.872251e-01 1.750374e-02 2.825268e+00
## L_dMFG_enc_delay        1.220655e-01 3.823562e+00 6.030721e+00 4.952132e+00
## L_IPS_enc_delay         5.677660e-01 7.064158e+00 7.792901e+00 8.419738e-01
## L_preSMA_enc_delay      1.888727e+00 1.034746e-01 5.579926e+00 1.803881e-02
## R_dlPFC_enc_delay       3.409512e+00 2.667911e-01 1.039266e+01 1.140855e+01
## R_IPS_enc_delay         3.248232e+00 8.453655e-01 1.167065e+00 7.868104e-02
## R_medPar_enc_delay      2.935836e+00 1.381320e-02 1.442797e-01 3.408844e-01
## L_aMFG_enc_delay_comb   1.434926e+01 6.096125e+00 7.523904e-01 9.966706e-01
## L_dlPFC_enc_delay_comb  5.423634e-04 5.565893e-02 1.897897e+00 1.160586e+00
## L_dMFG_enc_delay_comb   4.849584e-01 1.075859e+01 1.343891e-02 1.003448e-02
## L_IPS_enc_delay_comb    5.096030e-01 1.097080e+01 1.927426e+00 9.999080e-01
## L_preSMA_enc_delay_comb 2.591972e-02 3.978785e+00 4.482694e-01 1.949225e-01
## R_dlPFC_enc_delay_comb  8.620655e-01 7.533058e+00 2.360768e+01 4.613459e+00
## R_IPS_enc_delay_comb    6.437915e-01 1.431896e+00 3.533604e-03 1.416809e+01
## R_medPar_enc_delay_comb 1.849399e+01 2.665061e+00 3.468765e-02 2.574331e+00
## L_DELAY_LE              1.880791e-33 1.391484e-30 9.103028e-31 7.042105e-31
## L_PROBE_LE              8.137054e-31 1.264644e-31 2.034264e-31 1.880791e-33
## R_CUE_LE                3.081488e-31 3.900008e-31 3.478711e-31 4.190101e-30
## R_DELAY_LE              1.083336e-32 2.437505e-30 2.708339e-31 7.373453e-31
## R_PROBE_LE              2.225653e-30 1.692712e-32 1.194377e-30 7.981325e-31
##                               Dim.37
## L_aMFG_enc              3.084328e+01
## L_aMFG_delay            8.258684e+00
## L_dlPFC_enc             8.449653e-01
## L_dlPFC_delay           6.119697e-01
## L_dMFG_enc              4.100186e-01
## L_dMFG_delay            5.264705e-01
## L_IPS_enc               6.075113e+00
## L_IPS_delay             9.217897e-01
## L_preSMA_enc            3.194635e+00
## L_preSMA_delay          7.246992e-01
## R_dlPFC_enc             9.615468e-01
## R_dlPFC_delay           6.064210e-01
## R_IPS_enc               1.437185e-01
## R_IPS_delay             2.449935e-02
## R_medPar_enc            3.071457e-01
## R_medPar_delay          2.187188e-01
## L_aMFG_enc_delay        2.972876e-01
## L_dlPFC_enc_delay       5.974476e+00
## L_dMFG_enc_delay        5.698163e-01
## L_IPS_enc_delay         5.512185e-02
## L_preSMA_enc_delay      7.914228e-02
## R_dlPFC_enc_delay       5.240696e-01
## R_IPS_enc_delay         1.124699e-03
## R_medPar_enc_delay      1.701809e+00
## L_aMFG_enc_delay_comb   1.748144e+01
## L_dlPFC_enc_delay_comb  9.562777e+00
## L_dMFG_enc_delay_comb   8.818974e-03
## L_IPS_enc_delay_comb    4.589148e+00
## L_preSMA_enc_delay_comb 1.570767e+00
## R_dlPFC_enc_delay_comb  2.064347e-02
## R_IPS_enc_delay_comb    1.079740e-01
## R_medPar_enc_delay_comb 2.781908e+00
## L_DELAY_LE              1.156762e-30
## L_PROBE_LE              4.814825e-33
## R_CUE_LE                1.925930e-32
## R_DELAY_LE              3.081488e-31
## R_PROBE_LE              0.000000e+00
for (axis in seq.int(1,3)){
  print(fviz_contrib(res.pca, choice = "var", axes = axis, top = 10))
}

fviz_pca_var(res.pca,
             col.var = "contrib", # Color by contributions to the PC
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE     # Avoid text overlapping
)

Similarity measures

We’re going to the same thing for the similarity measures, since we have the same issue (though not quite as badly).

We cover 78% of the variance by PC5, so we’ll focus on those.

PC1: Similarity within trial PC2: Individual to template DFR trials - particularly correct trials PC3: High load correct trials in DFR, encoding to correct delay trials in high load correct trials in fusiform PC4: Similarity within high load incorrect trials during delay and encoding PC5: Correlation within low correct and high incorrect trials in DFR mask

res_sim.pca <- prcomp(data_for_reg[,c(39:56)], scale = TRUE)
fviz_eig(res_sim.pca)

summary(res_sim.pca)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.6812 1.5703 1.25092 1.07857 0.99578 0.92715 0.83264
## Proportion of Variance 0.3994 0.1370 0.08693 0.06463 0.05509 0.04776 0.03852
## Cumulative Proportion  0.3994 0.5364 0.62330 0.68793 0.74302 0.79078 0.82929
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.74977 0.70015 0.67719 0.58768 0.57033 0.50182 0.47232
## Proportion of Variance 0.03123 0.02723 0.02548 0.01919 0.01807 0.01399 0.01239
## Cumulative Proportion  0.86052 0.88776 0.91323 0.93242 0.95049 0.96448 0.97688
##                           PC15    PC16    PC17    PC18
## Standard deviation     0.37334 0.33247 0.30531 0.27034
## Proportion of Variance 0.00774 0.00614 0.00518 0.00406
## Cumulative Proportion  0.98462 0.99076 0.99594 1.00000
res_sim.var <- get_pca_var(res_sim.pca)
#res_sim.var$contrib      # Contributions to the PCs

for (axis in seq.int(1,5)){
  print(fviz_contrib(res_sim.pca, choice = "var", axes = axis, top = 10))
}

fviz_pca_var(res_sim.pca,
             col.var = "contrib", # Color by contributions to the PC
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE     # Avoid text overlapping
)

MVPA

Now, let’s do the same for the MVPA measures. We end up needing 12 PCs to reflect 75% of the variance.

PC1: Fusiform during encoding and probe PC2: DFR during encoding PC3: DFR during probe, averages from template across trial type PC4: Delay period decoding PC5: Fusiform encoding during template – most variance comes from high incorrect trials during individual trials PC6: Individual trials in fusiform, mostly during delay - mostly high incorrect individual trials PC7: Individual trials, low correct and high incorrect, but most variance comes from high incorect trials in fusiform during delay from averages PC8: High incorrect DFR across entire trial from individual trials PC9: DFR decoding during correct trials in encoding and probe PC10: Correct averaged from template trials during delay and probe - mostly DFR during probe PC11: Low correct trials, mostly delay period after largest variance explained from fusiform during encoding PC12: Mostly correct trials from templates in the fusiform - most variance explained by encoding and probe

res_MVPA.pca <- prcomp(data_for_reg[,c(57:92)], scale = TRUE)
fviz_eig(res_MVPA.pca)

summary(res_MVPA.pca)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.1575 1.9266 1.84175 1.68366 1.59799 1.56683 1.21421
## Proportion of Variance 0.1293 0.1031 0.09422 0.07874 0.07093 0.06819 0.04095
## Cumulative Proportion  0.1293 0.2324 0.32663 0.40537 0.47630 0.54450 0.58545
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     1.14888 1.08102 1.04333 1.00443 0.95488 0.93740 0.92004
## Proportion of Variance 0.03666 0.03246 0.03024 0.02802 0.02533 0.02441 0.02351
## Cumulative Proportion  0.62211 0.65457 0.68481 0.71284 0.73816 0.76257 0.78609
##                           PC15    PC16    PC17    PC18    PC19   PC20    PC21
## Standard deviation     0.84687 0.81679 0.79631 0.75825 0.72385 0.6971 0.68002
## Proportion of Variance 0.01992 0.01853 0.01761 0.01597 0.01455 0.0135 0.01285
## Cumulative Proportion  0.80601 0.82454 0.84215 0.85812 0.87268 0.8862 0.89902
##                          PC22   PC23    PC24    PC25    PC26    PC27    PC28
## Standard deviation     0.6434 0.6236 0.61003 0.59364 0.57402 0.56315 0.50385
## Proportion of Variance 0.0115 0.0108 0.01034 0.00979 0.00915 0.00881 0.00705
## Cumulative Proportion  0.9105 0.9213 0.93166 0.94145 0.95060 0.95941 0.96646
##                          PC29    PC30    PC31    PC32    PC33    PC34    PC35
## Standard deviation     0.4723 0.43151 0.41615 0.37341 0.37014 0.36314 0.34177
## Proportion of Variance 0.0062 0.00517 0.00481 0.00387 0.00381 0.00366 0.00324
## Cumulative Proportion  0.9727 0.97783 0.98264 0.98651 0.99032 0.99398 0.99723
##                           PC36
## Standard deviation     0.31598
## Proportion of Variance 0.00277
## Cumulative Proportion  1.00000
res_MVPA.var <- get_pca_var(res_MVPA.pca)
#res_MVPA.var$contrib      # Contributions to the PCs

for (axis in seq.int(1,12)){
  print(fviz_contrib(res_MVPA.pca, choice = "var", axes = axis, top = 10))
}

fviz_pca_var(res_MVPA.pca,
             col.var = "contrib", # Color by contributions to the PC
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE     # Avoid text overlapping
)

Structural Data

For the structural data, we see that 19 PCs are necessary to explain 75% of the variance. There was a lot of overlap in these, so it was kind of tough to describe/classify them

PC1: Parietal lobe structures and some superior frontal structures PC2: Subcortical structures (including cerebellum) PC3: Cuneus, pericalcarine, superior parietal, lingual PC4: Temporal pole, entorhinal PC5: Lingual, perihippocampal, pericalcarine PC6: corpus callosum, caudate PC7: More subcortical PC8: Frontal pole (and other frontal regions) PC9: corpus callosum, hippocampus and surrounding regions PC10: Cerebellum, frontal PC11: Parahippocampal and lateral occipital PC12: Temporal, entorhinal, parahippocampal PC13: Temporal pole, temporal cortex, amygdala and HPC PC14: pars orbitalis, cingulate, accumbens PC15: left hemisphere subcortical, pars triangularis PC16: Temporal, orbitofrontal , frontal pole PC17: banks sts, some temporal and parietal cortex PC18: Cingulate, frontal PC19: HPC, lateral orbitofrontal poles

res_struct.pca <- prcomp(struc_df[,c(4:37, 40:96)], scale = TRUE)
fviz_eig(res_struct.pca)

summary(res_struct.pca)
## Importance of components:
##                          PC1    PC2     PC3    PC4     PC5    PC6     PC7
## Standard deviation     4.808 3.0603 2.45392 1.9873 1.78776 1.6467 1.56606
## Proportion of Variance 0.254 0.1029 0.06617 0.0434 0.03512 0.0298 0.02695
## Cumulative Proportion  0.254 0.3569 0.42311 0.4665 0.50163 0.5314 0.55838
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     1.49575 1.36542 1.33629 1.29783 1.21173 1.19815 1.16793
## Proportion of Variance 0.02459 0.02049 0.01962 0.01851 0.01614 0.01578 0.01499
## Cumulative Proportion  0.58296 0.60345 0.62307 0.64158 0.65772 0.67349 0.68848
##                           PC15    PC16    PC17    PC18    PC19    PC20    PC21
## Standard deviation     1.15174 1.12525 1.06997 1.04709 1.02367 0.98047 0.96024
## Proportion of Variance 0.01458 0.01391 0.01258 0.01205 0.01152 0.01056 0.01013
## Cumulative Proportion  0.70306 0.71698 0.72956 0.74160 0.75312 0.76368 0.77382
##                           PC22    PC23    PC24   PC25    PC26    PC27    PC28
## Standard deviation     0.94282 0.92335 0.90870 0.8998 0.83794 0.82785 0.81884
## Proportion of Variance 0.00977 0.00937 0.00907 0.0089 0.00772 0.00753 0.00737
## Cumulative Proportion  0.78358 0.79295 0.80203 0.8109 0.81864 0.82617 0.83354
##                           PC29    PC30    PC31    PC32    PC33    PC34    PC35
## Standard deviation     0.81073 0.79640 0.77974 0.75909 0.74232 0.72511 0.71232
## Proportion of Variance 0.00722 0.00697 0.00668 0.00633 0.00606 0.00578 0.00558
## Cumulative Proportion  0.84076 0.84773 0.85441 0.86075 0.86680 0.87258 0.87815
##                           PC36    PC37    PC38    PC39    PC40    PC41    PC42
## Standard deviation     0.70242 0.69823 0.68353 0.67518 0.66311 0.65209 0.64481
## Proportion of Variance 0.00542 0.00536 0.00513 0.00501 0.00483 0.00467 0.00457
## Cumulative Proportion  0.88358 0.88893 0.89407 0.89908 0.90391 0.90858 0.91315
##                           PC43    PC44    PC45    PC46   PC47    PC48    PC49
## Standard deviation     0.63751 0.60464 0.60274 0.58650 0.5720 0.56529 0.55961
## Proportion of Variance 0.00447 0.00402 0.00399 0.00378 0.0036 0.00351 0.00344
## Cumulative Proportion  0.91762 0.92163 0.92563 0.92941 0.9330 0.93651 0.93996
##                           PC50    PC51    PC52    PC53    PC54    PC55    PC56
## Standard deviation     0.55386 0.54381 0.53336 0.51437 0.50853 0.49775 0.48243
## Proportion of Variance 0.00337 0.00325 0.00313 0.00291 0.00284 0.00272 0.00256
## Cumulative Proportion  0.94333 0.94658 0.94970 0.95261 0.95545 0.95817 0.96073
##                           PC57    PC58    PC59    PC60    PC61    PC62    PC63
## Standard deviation     0.46643 0.45147 0.44692 0.43848 0.43119 0.41933 0.40997
## Proportion of Variance 0.00239 0.00224 0.00219 0.00211 0.00204 0.00193 0.00185
## Cumulative Proportion  0.96312 0.96536 0.96756 0.96967 0.97171 0.97365 0.97549
##                           PC64    PC65    PC66    PC67    PC68    PC69    PC70
## Standard deviation     0.40208 0.38511 0.38452 0.37470 0.36597 0.35508 0.34540
## Proportion of Variance 0.00178 0.00163 0.00162 0.00154 0.00147 0.00139 0.00131
## Cumulative Proportion  0.97727 0.97890 0.98052 0.98207 0.98354 0.98492 0.98623
##                           PC71    PC72    PC73    PC74    PC75    PC76    PC77
## Standard deviation     0.33971 0.32808 0.31296 0.31074 0.29668 0.28850 0.27115
## Proportion of Variance 0.00127 0.00118 0.00108 0.00106 0.00097 0.00091 0.00081
## Cumulative Proportion  0.98750 0.98869 0.98976 0.99082 0.99179 0.99270 0.99351
##                           PC78   PC79    PC80    PC81    PC82    PC83    PC84
## Standard deviation     0.26410 0.2532 0.24542 0.23750 0.23084 0.22053 0.20787
## Proportion of Variance 0.00077 0.0007 0.00066 0.00062 0.00059 0.00053 0.00047
## Cumulative Proportion  0.99428 0.9950 0.99565 0.99627 0.99685 0.99739 0.99786
##                           PC85    PC86    PC87    PC88    PC89    PC90    PC91
## Standard deviation     0.20240 0.19391 0.17881 0.15860 0.15075 0.14293 0.12598
## Proportion of Variance 0.00045 0.00041 0.00035 0.00028 0.00025 0.00022 0.00017
## Cumulative Proportion  0.99831 0.99872 0.99907 0.99935 0.99960 0.99983 1.00000
res_struct.var <- get_pca_var(res_struct.pca)

for (axis in seq.int(1,19)){
  print(fviz_contrib(res_struct.pca, choice = "var", axes = axis, top = 10))
}

fviz_pca_var(res_struct.pca,
             col.var = "contrib", # Color by contributions to the PC
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE     # Avoid text overlapping
)

FA data

For the FA data, we need 5 PCs to explain 75% of the variance:

PC1: Inferior frontal occipital fasiculus, forceps, anterior thalamic radiation (but really captures a lot of variance from a lot of structures) PC2: Cingulum/hippocampus PC3: cingulum PC4: Superior longitudinal fasiculus in temporal lobe in the left hemisphere and cortcospinal tract PC5: Superior longitudinal fasiculus in temporal lobe in the right hemisphere and cingulum

res_FA.pca <- prcomp(FA_Data[,2:21], scale = TRUE)
fviz_eig(res_FA.pca)

summary(res_FA.pca)
## Importance of components:
##                           PC1     PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     3.2423 1.24533 1.2141 1.02058 1.00489 0.86869 0.82390
## Proportion of Variance 0.5256 0.07754 0.0737 0.05208 0.05049 0.03773 0.03394
## Cumulative Proportion  0.5256 0.60316 0.6769 0.72894 0.77943 0.81716 0.85111
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.72827 0.64431 0.56960 0.53522 0.49422 0.47438 0.45150
## Proportion of Variance 0.02652 0.02076 0.01622 0.01432 0.01221 0.01125 0.01019
## Cumulative Proportion  0.87762 0.89838 0.91460 0.92893 0.94114 0.95239 0.96258
##                           PC15    PC16    PC17    PC18    PC19    PC20
## Standard deviation     0.41569 0.38081 0.36814 0.33655 0.31425 0.28804
## Proportion of Variance 0.00864 0.00725 0.00678 0.00566 0.00494 0.00415
## Cumulative Proportion  0.97122 0.97847 0.98525 0.99091 0.99585 1.00000
res_FA.var <- get_pca_var(res_FA.pca)

for (axis in seq.int(1,5)){
  print(fviz_contrib(res_FA.pca, choice = "var", axes = axis, top = 10))
}

fviz_pca_var(res_FA.pca,
             col.var = "contrib", # Color by contributions to the PC
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE     # Avoid text overlapping
)

EEG Data

For the EEG data, we see that 11 PCs are necessary to explain 75% of the variance.

PC1: Alpha and beta at high load during probe PC2: ERSPs at probe during low load trials; specifically, low gamma
PC3: Alpha and beta power at delay, cue P3 PC4: Low gamma PC5: Theta power, n170 amplitude PC6: Theta power during cue, probe P3 PC7: CDA across lower load conditions (mostly 1 dot, some 3 dots), P3 PC8: CDA across higher load conditions (mostly 5 dot, some 3 dots), n170 at high load PC9: P3 component PC10: CDA - 5 dot condition has the most variance, really a mix PC11: CDA - 1 dot condition and also 5 dot condition (but high load has less contributions)

res_EEG.pca <- prcomp(EEG_df[complete.cases(EEG_df),2:66], scale = TRUE)
fviz_eig(res_EEG.pca)

summary(res_EEG.pca)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     3.5923 2.6623 2.28645 2.23038 1.97288 1.79846 1.70510
## Proportion of Variance 0.1985 0.1090 0.08043 0.07653 0.05988 0.04976 0.04473
## Cumulative Proportion  0.1985 0.3076 0.38800 0.46454 0.52442 0.57418 0.61891
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     1.62484 1.50068 1.41758 1.31305 1.24195 1.20095 1.13235
## Proportion of Variance 0.04062 0.03465 0.03092 0.02652 0.02373 0.02219 0.01973
## Cumulative Proportion  0.65952 0.69417 0.72509 0.75161 0.77534 0.79753 0.81726
##                           PC15    PC16    PC17   PC18    PC19   PC20   PC21
## Standard deviation     1.03640 1.00283 0.99154 0.9505 0.91791 0.8569 0.7476
## Proportion of Variance 0.01653 0.01547 0.01513 0.0139 0.01296 0.0113 0.0086
## Cumulative Proportion  0.83378 0.84925 0.86438 0.8783 0.89124 0.9025 0.9111
##                           PC22    PC23    PC24    PC25    PC26    PC27    PC28
## Standard deviation     0.69197 0.69084 0.65622 0.63315 0.61055 0.59280 0.55211
## Proportion of Variance 0.00737 0.00734 0.00662 0.00617 0.00574 0.00541 0.00469
## Cumulative Proportion  0.91850 0.92584 0.93247 0.93864 0.94437 0.94978 0.95447
##                           PC29    PC30    PC31    PC32    PC33    PC34    PC35
## Standard deviation     0.51556 0.51158 0.49653 0.46572 0.44426 0.41169 0.40618
## Proportion of Variance 0.00409 0.00403 0.00379 0.00334 0.00304 0.00261 0.00254
## Cumulative Proportion  0.95856 0.96258 0.96638 0.96971 0.97275 0.97536 0.97789
##                          PC36    PC37    PC38    PC39    PC40    PC41    PC42
## Standard deviation     0.3863 0.38145 0.35916 0.33437 0.30273 0.29564 0.29304
## Proportion of Variance 0.0023 0.00224 0.00198 0.00172 0.00141 0.00134 0.00132
## Cumulative Proportion  0.9802 0.98243 0.98441 0.98613 0.98754 0.98889 0.99021
##                           PC43    PC44    PC45    PC46    PC47    PC48    PC49
## Standard deviation     0.28214 0.25737 0.24047 0.23730 0.23107 0.22168 0.20836
## Proportion of Variance 0.00122 0.00102 0.00089 0.00087 0.00082 0.00076 0.00067
## Cumulative Proportion  0.99143 0.99245 0.99334 0.99421 0.99503 0.99579 0.99645
##                           PC50   PC51    PC52    PC53    PC54    PC55    PC56
## Standard deviation     0.19864 0.1804 0.17544 0.15914 0.14127 0.13091 0.12498
## Proportion of Variance 0.00061 0.0005 0.00047 0.00039 0.00031 0.00026 0.00024
## Cumulative Proportion  0.99706 0.9976 0.99804 0.99843 0.99873 0.99900 0.99924
##                          PC57    PC58    PC59    PC60    PC61    PC62      PC63
## Standard deviation     0.1147 0.10203 0.09564 0.09081 0.07391 0.05689 1.535e-07
## Proportion of Variance 0.0002 0.00016 0.00014 0.00013 0.00008 0.00005 0.000e+00
## Cumulative Proportion  0.9994 0.99960 0.99974 0.99987 0.99995 1.00000 1.000e+00
##                             PC64      PC65
## Standard deviation     1.484e-07 1.085e-07
## Proportion of Variance 0.000e+00 0.000e+00
## Cumulative Proportion  1.000e+00 1.000e+00
res_EEG.var <- get_pca_var(res_EEG.pca)

for (axis in seq.int(1,11)){
  print(fviz_contrib(res_EEG.pca, choice = "var", axes = axis, top = 10))
}

fviz_pca_var(res_EEG.pca,
             col.var = "contrib", # Color by contributions to the PC
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE     # Avoid text overlapping
)

RS Data

For the resting state data, we see that 8 PCs are necessary to explain 75% of the variance.

PC1: Participation coefficients across network (both average and individual network) PC2: FPCN connectivity with other networks, within connectivity for DMN and DAN PC3: Within connectivity for DMN, FPCN connectivity with visual, CO, DMN, DANA networks PC4: Global efficiency, within CO connectivity, participation coefficients PC5: Within visual connectivity and participation coefficient, within CO connectivity PC6: Louvain modularity, within DAN connectivity PC7: Within network VAN connectivity PC8: Global efficiency, FPCN connectivity with DMN, FPCN and DAN, VAN intranetwork connectivity

res_RS.pca <- prcomp(RS_df[complete.cases(RS_df),2:21], scale = TRUE)
fviz_eig(res_RS.pca)

summary(res_RS.pca)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.1445 2.0285 1.29812 1.13191 1.04771 0.98921 0.94066
## Proportion of Variance 0.2299 0.2057 0.08426 0.06406 0.05489 0.04893 0.04424
## Cumulative Proportion  0.2299 0.4357 0.51995 0.58401 0.63890 0.68783 0.73207
##                            PC8     PC9   PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.90541 0.84708 0.8087 0.76047 0.75206 0.68202 0.60154
## Proportion of Variance 0.04099 0.03588 0.0327 0.02892 0.02828 0.02326 0.01809
## Cumulative Proportion  0.77306 0.80893 0.8416 0.87055 0.89883 0.92209 0.94018
##                          PC15    PC16    PC17    PC18    PC19    PC20
## Standard deviation     0.5745 0.50112 0.46856 0.41335 0.39669 0.25988
## Proportion of Variance 0.0165 0.01256 0.01098 0.00854 0.00787 0.00338
## Cumulative Proportion  0.9567 0.96923 0.98021 0.98875 0.99662 1.00000
res_RS.var <- get_pca_var(res_RS.pca)

for (axis in seq.int(1,8)){
  print(fviz_contrib(res_RS.pca, choice = "var", axes = axis, top = 10))
}

fviz_pca_var(res_RS.pca,
             col.var = "contrib", # Color by contributions to the PC
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE     # Avoid text overlapping
)

Now that we have our PCs to reduce overlapping variance, we can use these in a series of regressions to predict span, BPRS and accuracy at high load. We will use a number of different methods to select features: stepwise regression, ridge regression, LASSO regression and ElasticNet regression. A brief description of the methods are as follows:

Stepwise Regression: Initially takes all parameters and sequentially removes parameters to minimize the AIC of the model overall. At each step, it calculates what the AIC would be if it removed each individual parameter and then removes the one that makes the most difference. Stops when model converges at a minimal AIC.

Ridge Regression: A type of penalized linear regression (uses L2 norm - minimizes the sum of the squared coefficients). Shrinks the values of unimportant coefficients close to zero, but does not remove them.

LASSO Regression: Another type of penalized linear regression (uses L1 norm - minimizes the sum of the absolute value of the coefficients). Forces coefficients that are unimportant to the model to be zero, creating a sparse model.

ElasticNet Regression: Uses both L1- and L2-norm penalties with regression, so shrinks some coefficients close to zero and some to exactly zero.

Each model is performed in a 10 fold cross validation, with inner hyperparameter tuning. There are 10% of subjects held out over each of the CV loops, so R squared is calculated using predictions on the held out data and from the best model for each cross validation, meaning that every subject gets a prediction.

When listing the measures that occur in each model after feature selection, the features listed are those that show up in 7 or more cross validation iterations.

Across the three potential feature selection procedures, a different model explains the most variance for each behavioral measure. However, across all three, ElasticNet performs consistently the best across the measures (2nd best [roughly the same] for span, second best for BPRS and best for high load accuracy), so we will use that as the model to compare across behavioral measure.

Predicting Span

Prep data

reg_data <- data.frame(PTID = constructs_fMRI$PTID, 
                       span = data_for_reg$span, 
                       high_acc = data_for_reg$high_acc,
                       BPRS = data_for_reg$BPRS) 

univ_PCs <- data.frame(PTID = constructs_fMRI$PTID[!is.na(data_for_reg$L_CUE_LE)], res.pca[["x"]][,1:3])
colnames(univ_PCs)[2:4] <- paste("PC",c(1:3),"_univ",sep="")
sim_PCs <- data.frame(PTID = constructs_fMRI$PTID, res_sim.pca[["x"]][,1:5])
colnames(sim_PCs)[2:6] <- paste("PC",c(1:5),"_sim",sep="")
MVPA_PCs <- data.frame(PTID = constructs_fMRI$PTID, res_MVPA.pca[["x"]][,1:12])
colnames(MVPA_PCs)[2:13] <- paste("PC",c(1:12),"_MVPA",sep="")
EEG_PCs <- data.frame(PTID = EEG_df$PTID[complete.cases(EEG_df)], res_EEG.pca[["x"]][,1:11])
colnames(EEG_PCs)[2:12] <- paste("PC",c(1:11),"_EEG",sep="")
RS_PCs <- data.frame(PTID = RS_df$PTID, res_RS.pca[["x"]][,1:8])
colnames(RS_PCs)[2:9] <- paste("PC",c(1:8),"_RS",sep="")
struc_PCs <- data.frame(PTID=struc_df$ID, res_struct.pca[["x"]][,1:19])
colnames(struc_PCs)[2:20] <- paste("PC",c(1:19),"_struc",sep="")
FA_PCs <- data.frame(PTID = FA_Data$ID, res_FA.pca[["x"]][,1:5])
colnames(FA_PCs)[2:6] <- paste("PC", c(1:5), "_FA", sep="")

reg_data <- Reduce(function(x,y) merge(x=x, y=y, by = "PTID", all.x = TRUE), 
                   list(reg_data, univ_PCs, sim_PCs, MVPA_PCs, EEG_PCs, RS_PCs, struc_PCs, FA_PCs))

#add scanner, age, gender (won't include CS/NCS in regression) and dummy code 
reg_data <- merge(reg_data, p200_demographics, by = "PTID")
reg_data$SCANNER <- reg_data$SCANNER - 1
reg_data$GENDER <- reg_data$GENDER - 1

reg_data <- reg_data[complete.cases(reg_data),1:70]

set.seed(123)
lambda <- 10^seq(-3, 3, length = 100)

# save(list = c("univ_PCs", "sim_PCs", "MVPA_PCs", "EEG_PCs", "RS_PCs", "struc_PCs", "FA_PCs"), file = "~/Documents/Code/RDoC_for_GitHub/data/PCs.RData")

Stepwise Regression

While the final stepwise model is significant overall, we see significant effects of:

  • PC3_univ: encoding - delay in parietal regions and fusiform activity during delay
  • PC5_MVPA: Fusiform encoding during template – most variance comes from high incorrect trials during individual trials
  • PC8_MVPA: High incorrect DFR across entire trial from individual trials
  • PC12_MVPA: Mostly correct trials from templates in the fusiform - most variance explained by encoding and probe
  • PC2_EEG: ERSPs at probe during low load trials; specifically, low gamma
  • PC3_EEG: Alpha power at delay, beta during cue
  • PC6_EEG: N170 amplitude and high load Oz power across frequency band
  • PC8_EEG: Low gamma in mid occipital component
  • PC9_EEG: P3 component
  • PC1_RS: Participation coefficients across network (both average and individual network)
  • PC2_RS: FPCN connectivity with other networks, within connectivity for DMN and DAN
  • PC2_FA: Cingulum/hippocampus
  • PC4_struct: Temporal pole, entorhinal
  • PC6_struct: corpus callosum, caudate
  • PC12_struct: Temporal, entorhinal, parahippocampal
  • PC15_struct: left hemisphere subcortical, pars triangularis
  • Scanner
  • Gender
  • Age
# set up outer cross validation 
split <- vfold_cv(reg_data,v=10)

best_models_step_span <- list()
RMSE_cv_step_span <- data.frame(matrix(ncol=10, nrow=1))

for (fold in seq.int(1,10)){ 
  
  # set up training and test set for fold
  train_data <-  analysis(split$splits[[fold]])
  train_data <- train_data[,c(2, 5:70)]
  test_data <- assessment(split$splits[[fold]])
  test_data <- test_data[,c(2, 5:70)]
  train_data[,1:67] <- sapply(train_data[,1:67],scale)
  train_data <- data.frame(train_data)
  test_data[,1:67] <- sapply(test_data[,1:67], scale)
  test_data <- data.frame(test_data)
  
  step.span <- train(span ~., data = train_data,
                     method = "lmStepAIC", 
                     trControl = trainControl("cv", number = 10),
                     trace = FALSE
  )
  best_models_step_span[[fold]] <- step.span$finalModel
  preds <- step.span %>% predict(test_data)
  
  # Make predictions
  if (fold == 1){ 
    cv_preds_step_span <- data.frame(true = test_data$span, pred = preds)
  }else{ 
    cv_preds_step_span <- rbind(cv_preds_step_span, data.frame(true = test_data$span, pred = preds))}
  
  RMSE_cv_step_span[fold] <- RMSE(preds, test_data$span)
  
}
model_comparison_template <- list(univ = data.frame(matrix(nrow=10, ncol=3)), 
                                  sim = data.frame(matrix(nrow=10, ncol=5)), 
                                  MVPA = data.frame(matrix(nrow=10, ncol=12)), 
                                  EEG = data.frame(matrix(nrow=10, ncol=11)), 
                                  RS = data.frame(matrix(nrow=10, ncol=8)), 
                                  FA = data.frame(matrix(nrow=10, ncol=5)), 
                                  struc = data.frame(matrix(nrow=10, ncol=19)), 
                                  demo = data.frame(matrix(nrow=10, ncol=3)))

colnames(model_comparison_template[["univ"]]) <- paste("PC",c(1:3),"_univ",sep="")
colnames(model_comparison_template[["sim"]]) <- paste("PC",c(1:5),"_sim",sep="")
colnames(model_comparison_template[["MVPA"]]) <- paste("PC",c(1:12),"_MVPA",sep="")
colnames(model_comparison_template[["EEG"]]) <- paste("PC",c(1:11),"_EEG",sep="")
colnames(model_comparison_template[["RS"]]) <- paste("PC",c(1:8),"_RS",sep="")
colnames(model_comparison_template[["struc"]]) <- paste("PC",c(1:19),"_struc",sep="")
colnames(model_comparison_template[["FA"]]) <- paste("PC", c(1:5), "_FA", sep="")
colnames(model_comparison_template[["demo"]]) <- c("GENDER", "AGE", "SCANNER")
step_span_best_coeffs <- model_comparison_template

for (fold in seq.int(1,10)){ 
  coefs <- rownames(data.frame(best_models_step_span[[fold]]$coefficients))
  split_coef <- strsplit(coefs, split="_")
  for (row in seq.int(2,length(coefs))){
    if (length(split_coef[[row]]) > 1){
      PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
      step_span_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
    }
  }
  
  if ("GENDER" %in% coefs){
    step_span_best_coeffs[["demo"]]$GENDER <- "x"
  }
  if ("AGE" %in% coefs){
    step_span_best_coeffs[["demo"]]$AGE <- "x"
  }
  if ("SCANNER" %in% coefs){
    step_span_best_coeffs[["demo"]]$SCANNER <- "x"
  }
}
compare_span <- data.frame(matrix(nrow=2, ncol=4))
colnames(compare_span) <- c("stepwise", "ridge", "lasso", "elastic")
rownames(compare_span) <- c("RMSE", "Rsquared")

compare_span$stepwise[1] <- rowMeans(RMSE_cv_step_span)
compare_span$stepwise[2] <- cor(cv_preds_step_span$true, cv_preds_step_span$pred)^2

Ridge Regression

# set up outer cross validation 
split <- vfold_cv(reg_data,v=10)

best_models_ridge_span <- list()
RMSE_cv_ridge_span <- data.frame(matrix(ncol=10, nrow=1))

for (fold in seq.int(1,10)){ 
  
  # set up training and test set for fold
  train_data <-  analysis(split$splits[[fold]])
  train_data <- train_data[,c(2, 5:70)]
  test_data <- assessment(split$splits[[fold]])
  test_data <- test_data[,c(2, 5:70)]
  train_data[,1:67] <- sapply(train_data[,1:67],scale)
  train_data <- data.frame(train_data)
  test_data[,1:67] <- sapply(test_data[,1:67], scale)
  test_data <- data.frame(test_data)
  ridge.span <- train(
    span ~., data = train_data, method = "glmnet",
    trControl = trainControl("cv", number = 10),
    tuneGrid = expand.grid(alpha = 0, lambda = lambda)
  )
  best_models_ridge_span[[fold]] <- coef(ridge.span$finalModel, ridge.span$bestTune$lambda)
  preds <- ridge.span %>% predict(test_data)
  
  # Make predictions
  if (fold == 1){ 
    cv_preds_ridge_span <- data.frame(true = test_data$span, pred = preds)
  }else{ 
    cv_preds_ridge_span <- rbind(cv_preds_ridge_span, data.frame(true = test_data$span, pred = preds))}
  
  RMSE_cv_ridge_span[fold] <- RMSE(preds, test_data$span)
  
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_span$ridge[1] <- rowMeans(RMSE_cv_ridge_span)
compare_span$ridge[2] <- cor(cv_preds_ridge_span$true, cv_preds_ridge_span$pred)^2

LASSO Regression

In the LASSO regression, we see:

  • PC1_univ: encoding load effects and the combined encoding+encoding-delay in the PFC regions. This value is negative, suggesting that stronger load effects is related to lower span
  • PC1_sim: Similarity within trial
  • PC3_MVPA: DFR during probe, averages from template across trial type
  • PC2_EEG: ERSPs at probe during low load trials; specifically, low gamma
  • PC3_EEG: Alpha power at delay, beta during cue
  • PC6_EEG: N170 amplitude and high load Oz power across frequency band
  • PC9_EEG: P3 component
  • PC2_FA: Cingulum/hippocampus
  • PC5_FA: Superior longitudinal fasiculus in temporal lobe in the right hemisphere and cingulum
  • PC13_struct: Temporal pole, temporal cortex, amygdala and HPC
  • PC15_struct: left hemisphere subcortical, pars triangularis
  • Scanner
  • Age
  • Gender
# set up outer cross validation 
split <- vfold_cv(reg_data,v=10)

best_models_lasso_span <- list()
RMSE_cv_lasso_span <- data.frame(matrix(ncol=10, nrow=1))

for (fold in seq.int(1,10)){ 
  
  # set up training and test set for fold
  train_data <-  analysis(split$splits[[fold]])
  train_data <- train_data[,c(2, 5:70)]
  test_data <- assessment(split$splits[[fold]])
  test_data <- test_data[,c(2, 5:70)]
  train_data[,1:67] <- sapply(train_data[,1:67],scale)
  train_data <- data.frame(train_data)
  test_data[,1:67] <- sapply(test_data[,1:67], scale)
  test_data <- data.frame(test_data)
  lasso.span <- train(
    span ~., data =train_data, method = "glmnet",
    trControl = trainControl("cv", number = 10),
    tuneGrid = expand.grid(alpha = 1, lambda = lambda)
  )
  best_models_lasso_span[[fold]] <- coef(lasso.span$finalModel, lasso.span$bestTune$lambda)
  preds <- lasso.span %>% predict(test_data)
  
  # Make predictions
  if (fold == 1){ 
    cv_preds_lasso_span <- data.frame(true = test_data$span, pred = preds)
  }else{ 
    cv_preds_lasso_span <- rbind(cv_preds_lasso_span, data.frame(true = test_data$span, pred = preds))}
  
  RMSE_cv_lasso_span[fold] <- RMSE(preds, test_data$span)
  
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_span$lasso[1] <- rowMeans(RMSE_cv_lasso_span)
compare_span$lasso[2] <- cor(cv_preds_lasso_span$true, cv_preds_lasso_span$pred)^2
lasso_span_best_coeffs <- model_comparison_template

for (fold in seq.int(1,10)){ 
  coefs <- rownames(best_models_lasso_span[[fold]])
  split_coef <- strsplit(coefs, split="_")
  for (row in seq.int(2,length(coefs))){
    if (length(split_coef[[row]]) > 1){
      PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
      if (best_models_lasso_span[[fold]][row]!= 0){
        lasso_span_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
      }
    }
  }
  
  if ("GENDER" %in% coefs){
    lasso_span_best_coeffs[["demo"]]$GENDER <- "x"
  }
  if ("AGE" %in% coefs){
    lasso_span_best_coeffs[["demo"]]$AGE <- "x"
  }
  if ("SCANNER" %in% coefs){
    lasso_span_best_coeffs[["demo"]]$SCANNER <- "x"
  }
}

ElasticNet

  • PC1_univ: encoding load effects and the combined encoding+encoding-delay in the PFC regions. This value is negative, suggesting that stronger load effects is related to lower span
  • PC1_sim: Similarity within trial
  • PC3_sim: High load correct trials in DFR, encoding to correct delay trials in high load correct trials in fusiform
  • PC3_MVPA: DFR during probe, averages from template across trial type
  • PC4_MVPA: Delay period decoding
  • PC11_MVPA: Low correct trials, mostly delay period after largest variance explained from fusiform during encoding
  • PC12_MVPA: Mostly correct trials from templates in the fusiform - most variance explained by encoding and probe
  • PC2_EEG: ERSPs at probe during low load trials; specifically, low gamma
  • PC3_EEG: Alpha power at delay, beta during cue
  • PC5_EEG: Theta power
  • PC6_EEG: N170 amplitude and high load Oz power across frequency band
  • PC9_EEG: P3 component
  • PC1_RS: Participation coefficients across network (both average and individual network)
  • PC4_RS: Global efficiency, within CO connectivity, participation coefficients
  • PC2_FA: Cingulum/hippocampus
  • PC5_FA: Superior longitudinal fasiculus in temporal lobe in the right hemisphere and cingulum
  • PC13_struct: Temporal pole, temporal cortex, amygdala and HPC
  • PC15_struct: left hemisphere subcortical, pars triangularis
  • PC16_struct Temporal, orbitofrontal , frontal pole
  • Scanner
  • Age
  • Gender

It seems as though these are particularly weighted towards encoding and regions associated with visual processing. Structural seems to show some HPC involvement. While accuracy seems to be only weighted towards visual components, predicting span also includes measures at low load, from the delay period, and all modalities.

# set up outer cross validation 
split <- vfold_cv(reg_data,v=10)

best_models_elastic_span <- list()
RMSE_cv_elastic_span <- data.frame(matrix(ncol=10, nrow=1))

for (fold in seq.int(1,10)){ 
  
  # set up training and test set for fold
  train_data <-  analysis(split$splits[[fold]])
  train_data <- train_data[,c(2, 5:70)]
  test_data <- assessment(split$splits[[fold]])
  test_data <- test_data[,c(2, 5:70)]
  train_data[,1:67] <- sapply(train_data[,1:67],scale)
  train_data <- data.frame(train_data)
  test_data[,1:67] <- sapply(test_data[,1:67], scale)
  test_data <- data.frame(test_data)
  
  elastic.span <- train(
    span ~., data = train_data, method = "glmnet",
    trControl = trainControl("cv", number = 10),
    tuneLength = 10
  )
  best_models_elastic_span[[fold]] <- elastic.span$finalModel
  preds <- elastic.span %>% predict(test_data)
  
  # Make predictions
  if (fold == 1){ 
    cv_preds_elastic_span <- data.frame(true = test_data$span, pred = preds)
  }else{ 
    cv_preds_elastic_span <- rbind(cv_preds_elastic_span, data.frame(true = test_data$span, pred = preds))}
  
  RMSE_cv_elastic_span[fold] <- RMSE(preds, test_data$span)
  
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_span$elastic[1] <- rowMeans(RMSE_cv_elastic_span)
compare_span$elastic[2] <- cor(cv_preds_elastic_span$true, cv_preds_elastic_span$pred)^2
elastic_span_best_coeffs <- model_comparison_template

for (fold in seq.int(1,10)){ 
  coefs <- coef(best_models_elastic_span[[fold]], s = best_models_elastic_span[[fold]]$tuneValue$lambda)
  split_coef <- strsplit(rownames(coefs), split="_")
  for (row in seq.int(2,length(coefs))){
    if (length(split_coef[[row]]) > 1){
      PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
      if (coefs[row]!= 0){
        elastic_span_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
      }
    }
  }
  
  if ("GENDER" %in% rownames(coefs)){
    elastic_span_best_coeffs[["demo"]]$GENDER <- "x"
  }
  if ("AGE" %in% rownames(coefs)){
    elastic_span_best_coeffs[["demo"]]$AGE <- "x"
  }
  if ("SCANNER" %in% rownames(coefs)){
    elastic_span_best_coeffs[["demo"]]$SCANNER <- "x"
  }
}

Compare Models

When comparing models, LASSO is our best model that contains feature selection, explaining ~9% of the variance in span. Ridge regression does slightly better, explaining 11% of the variance.

compare_span
##             stepwise     ridge     lasso    elastic
## RMSE     1.229236514 0.9107650 0.9069912 0.93027114
## Rsquared 0.006068862 0.1053506 0.1215684 0.08044747

Predicting BPRS

Stepwise regression

To predict BPRS, we see the PCs that contain important information are:

  • PC3_univ: encoding - delay in parietal regions and fusiform activity during delay
  • PC1_sim: Similarity within trial
  • PC5_MVPA: Fusiform encoding during template – most variance comes from high incorrect trials during individual trials
  • PC1_EEG: Alpha and beta at high load during probe
  • PC7_EEG: P3, n170 amplitude; low gamma
  • PC11_EEG: Oz at low load, mid occipital at high load
  • PC4_RS: Global efficiency, within CO connectivity, participation coefficients
  • PC2_FA: Cingulum/hippocampus
  • PC3_FA: cingulum
  • PC5_FA: Superior longitudinal fasiculus in temporal lobe in the right hemisphere and cingulum
  • PC2_struct: Subcortical structures (including cerebellum)
  • PC4_struct: Temporal pole, entorhinal
  • PC7_struct: More subcortical
  • PC8_struct: Frontal pole (and other frontal regions)
  • PC14_struct: pars orbitalis, cingulate, accumbens
  • PC15_struct: left hemisphere subcortical, pars triangularis
  • Age
  • Scanner
# set up outer cross validation 
split <- vfold_cv(reg_data,v=10)

best_models_step_BPRS <- list()
RMSE_cv_step_BPRS <- data.frame(matrix(ncol=10, nrow=1))

for (fold in seq.int(1,10)){ 
  
  # set up training and test set for fold
  train_data <-  analysis(split$splits[[fold]])
  train_data <- train_data[,c(4, 5:70)]
  test_data <- assessment(split$splits[[fold]])
  test_data <- test_data[,c(4, 5:70)]
  train_data[,1:67] <- sapply(train_data[,1:67],scale)
  train_data <- data.frame(train_data)
  test_data[,1:67] <- sapply(test_data[,1:67], scale)
  test_data <- data.frame(test_data)
  
  step.BPRS <- train(BPRS ~., data = train_data,
                     method = "lmStepAIC", 
                     trControl = trainControl("cv", number = 10),
                     trace = FALSE
  )
  best_models_step_BPRS[[fold]] <- step.BPRS$finalModel
  preds <- step.BPRS %>% predict(test_data)
  
  # Make predictions
  if (fold == 1){ 
    cv_preds_step_BPRS <- data.frame(true = test_data$BPRS, pred = preds)
  }else{ 
    cv_preds_step_BPRS <- rbind(cv_preds_step_BPRS, data.frame(true = test_data$BPRS, pred = preds))}
  
  RMSE_cv_step_BPRS[fold] <- RMSE(preds, test_data$BPRS)
  
}
step_BPRS_best_coeffs <- model_comparison_template

for (fold in seq.int(1,10)){ 
  coefs <- rownames(data.frame(best_models_step_BPRS[[fold]]$coefficients))
  split_coef <- strsplit(coefs, split="_")
  for (row in seq.int(2,length(coefs))){
    if (length(split_coef[[row]]) > 1){
      PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
      step_BPRS_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
    }
  }
  
  if ("GENDER" %in% coefs){
    step_BPRS_best_coeffs[["demo"]]$GENDER <- "x"
  }
  if ("AGE" %in% coefs){
    step_BPRS_best_coeffs[["demo"]]$AGE <- "x"
  }
  if ("SCANNER" %in% coefs){
    step_BPRS_best_coeffs[["demo"]]$SCANNER <- "x"
  }
}
compare_BPRS <- data.frame(matrix(nrow=2, ncol=4))
colnames(compare_BPRS) <- c("stepwise", "ridge", "lasso", "elastic")
rownames(compare_BPRS) <- c("RMSE", "Rsquared")

compare_BPRS$stepwise[1] <- rowMeans(RMSE_cv_step_BPRS)
compare_BPRS$stepwise[2] <- cor(cv_preds_step_BPRS$true, cv_preds_step_BPRS$pred)^2

Ridge Regression

# set up outer cross validation 
split <- vfold_cv(reg_data,v=10)

best_models_ridge_BPRS <- list()
RMSE_cv_ridge_BPRS <- data.frame(matrix(ncol=10, nrow=1))

for (fold in seq.int(1,10)){ 
  
  # set up training and test set for fold
  train_data <-  analysis(split$splits[[fold]])
  train_data <- train_data[,c(4, 5:70)]
  test_data <- assessment(split$splits[[fold]])
  test_data <- test_data[,c(4, 5:70)]
  train_data[,1:67] <- sapply(train_data[,1:67],scale)
  train_data <- data.frame(train_data)
  test_data[,1:67] <- sapply(test_data[,1:67], scale)
  test_data <- data.frame(test_data)
  ridge.BPRS <- train(
    BPRS ~., data = train_data, method = "glmnet",
    trControl = trainControl("cv", number = 10),
    tuneGrid = expand.grid(alpha = 0, lambda = lambda)
  )
  best_models_ridge_BPRS[[fold]] <- coef(ridge.BPRS$finalModel, ridge.BPRS$bestTune$lambda)
  preds <- ridge.BPRS %>% predict(test_data)
  
  # Make predictions
  if (fold == 1){ 
    cv_preds_ridge_BPRS <- data.frame(true = test_data$BPRS, pred = preds)
  }else{ 
    cv_preds_ridge_BPRS <- rbind(cv_preds_ridge_BPRS, data.frame(true = test_data$BPRS, pred = preds))}
  
  RMSE_cv_ridge_BPRS[fold] <- RMSE(preds, test_data$BPRS)
  
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_BPRS$ridge[1] <- rowMeans(RMSE_cv_ridge_BPRS)
compare_BPRS$ridge[2] <- cor(cv_preds_ridge_BPRS$true, cv_preds_ridge_BPRS$pred)^2

LASSO Regression

LASSO gives us almost a combination of the above analyses:

  • PC7_EEG: P3, n170 amplitude; low gamma
  • PC5_RS: Within visual connectivity and participation coefficient, within CO connectivity
  • PC7_struct: More subcortical
  • PC14_struct: pars orbitalis, cingulate, accumbens
  • PC15_struct: left hemisphere subcortical, pars triangularis
  • Gender
  • Age
  • Scanner
# set up outer cross validation 
split <- vfold_cv(reg_data,v=10)

best_models_lasso_BPRS <- list()
RMSE_cv_lasso_BPRS <- data.frame(matrix(ncol=10, nrow=1))

for (fold in seq.int(1,10)){ 
  
  # set up training and test set for fold
  train_data <-  analysis(split$splits[[fold]])
  train_data <- train_data[,c(4, 5:70)]
  test_data <- assessment(split$splits[[fold]])
  test_data <- test_data[,c(4, 5:70)]
  train_data[,1:67] <- sapply(train_data[,1:67],scale)
  train_data <- data.frame(train_data)
  test_data[,1:67] <- sapply(test_data[,1:67], scale)
  test_data <- data.frame(test_data)
  lasso.BPRS <- train(
    BPRS ~., data =train_data, method = "glmnet",
    trControl = trainControl("cv", number = 10),
    tuneGrid = expand.grid(alpha = 1, lambda = lambda)
  )
  best_models_lasso_BPRS[[fold]] <- coef(lasso.BPRS$finalModel, lasso.BPRS$bestTune$lambda)
  preds <- lasso.BPRS %>% predict(test_data)
  
  # Make predictions
  if (fold == 1){ 
    cv_preds_lasso_BPRS <- data.frame(true = test_data$BPRS, pred = preds)
  }else{ 
    cv_preds_lasso_BPRS <- rbind(cv_preds_lasso_BPRS, data.frame(true = test_data$BPRS, pred = preds))}
  
  RMSE_cv_lasso_BPRS[fold] <- RMSE(preds, test_data$BPRS)
  
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_BPRS$lasso[1] <- rowMeans(RMSE_cv_lasso_BPRS)
compare_BPRS$lasso[2] <- cor(cv_preds_lasso_BPRS$true, cv_preds_lasso_BPRS$pred)^2
lasso_BPRS_best_coeffs <- model_comparison_template

for (fold in seq.int(1,10)){ 
  coefs <- rownames(best_models_lasso_BPRS[[fold]])
  split_coef <- strsplit(coefs, split="_")
  for (row in seq.int(2,length(coefs))){
    if (length(split_coef[[row]]) > 1){
      PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
      if (best_models_lasso_BPRS[[fold]][row]!= 0){
        lasso_BPRS_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
      }
    }
  }
  
  if ("GENDER" %in% coefs){
    lasso_BPRS_best_coeffs[["demo"]]$GENDER <- "x"
  }
  if ("AGE" %in% coefs){
    lasso_BPRS_best_coeffs[["demo"]]$AGE <- "x"
  }
  if ("SCANNER" %in% coefs){
    lasso_BPRS_best_coeffs[["demo"]]$SCANNER <- "x"
  }
}

ElasticNet

  • PC3_sim: High load correct trials in DFR, encoding to correct delay trials in high load correct trials in fusiform
  • PC7_MVPA: Individual trials, low correct and high incorrect, but most variance comes from high incorrect trials in fusiform during delay from averages
  • PC1_EEG: Alpha and beta at high load during probe
  • PC7_EEG: P3, n170 amplitude; low gamma
  • PC9_EEG: P3 component
  • PC5_RS: Within visual connectivity and participation coefficient, within CO connectivity
  • PC4_struct: Temporal pole, entorhinal
  • PC7_struct: More subcortical
  • PC14_struct: pars orbitalis, cingulate, accumbens
  • PC15_struct: left hemisphere subcortical, pars triangularis
  • Gender
  • Age
  • Scanner
# set up outer cross validation 
split <- vfold_cv(reg_data,v=10)

best_models_elastic_BPRS <- list()
RMSE_cv_elastic_BPRS <- data.frame(matrix(ncol=10, nrow=1))

for (fold in seq.int(1,10)){ 
  
  # set up training and test set for fold
  train_data <-  analysis(split$splits[[fold]])
  train_data <- train_data[,c(4, 5:70)]
  test_data <- assessment(split$splits[[fold]])
  test_data <- test_data[,c(4, 5:70)]
  train_data[,1:67] <- sapply(train_data[,1:67],scale)
  train_data <- data.frame(train_data)
  test_data[,1:67] <- sapply(test_data[,1:67], scale)
  test_data <- data.frame(test_data)
  
  elastic.BPRS <- train(
    BPRS ~., data = train_data, method = "glmnet",
    trControl = trainControl("cv", number = 10),
    tuneLength = 10
  )
  best_models_elastic_BPRS[[fold]] <- elastic.BPRS$finalModel
  preds <- elastic.BPRS %>% predict(test_data)
  
  # Make predictions
  if (fold == 1){ 
    cv_preds_elastic_BPRS <- data.frame(true = test_data$BPRS, pred = preds)
  }else{ 
    cv_preds_elastic_BPRS <- rbind(cv_preds_elastic_BPRS, data.frame(true = test_data$BPRS, pred = preds))}
  
  RMSE_cv_elastic_BPRS[fold] <- RMSE(preds, test_data$BPRS)
  
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_BPRS$elastic[1] <- rowMeans(RMSE_cv_elastic_BPRS)
compare_BPRS$elastic[2] <- cor(cv_preds_elastic_BPRS$true, cv_preds_elastic_BPRS$pred)^2
elastic_BPRS_best_coeffs <- model_comparison_template

for (fold in seq.int(1,10)){ 
  coefs <- coef(best_models_elastic_BPRS[[fold]], s = best_models_elastic_BPRS[[fold]]$tuneValue$lambda)
  split_coef <- strsplit(rownames(coefs), split="_")
  for (row in seq.int(2,length(coefs))){
    if (length(split_coef[[row]]) > 1){
      PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
      if (coefs[row]!= 0){
        elastic_BPRS_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
      }
    }
  }
  
  if ("GENDER" %in% rownames(coefs)){
    elastic_BPRS_best_coeffs[["demo"]]$GENDER <- "x"
  }
  if ("AGE" %in% rownames(coefs)){
    elastic_BPRS_best_coeffs[["demo"]]$AGE <- "x"
  }
  if ("SCANNER" %in% rownames(coefs)){
    elastic_BPRS_best_coeffs[["demo"]]$SCANNER <- "x"
  }
}

Compare Models

Here, stepwise regression is the best model that performs feature selection - it explains 9% of the variance in BPRS, although it does have the highest RMSE. Again, Ridge explains the most variance (10%).

compare_BPRS
##            stepwise      ridge      lasso    elastic
## RMSE     1.06439845 0.93569897 0.95144131 0.97678297
## Rsquared 0.09644658 0.06325695 0.03558682 0.01704404

Predicting Accuracy

Stepwise regression

  • PC1_univ: encoding load effects and the combined encoding+encoding-delay in the PFC regions
  • PC2_univ: delay load effect
  • PC3_univ: encoding - delay in parietal regions and fusiform activity during delay
  • PC1_sim: Similarity within trial
  • PC2_MVPA: DFR during encoding
  • PC4_EEG: Low gamma
  • PC6_EEG: N170 amplitude and high load Oz power across frequency band
  • PC9_EEG: P3 component
  • PC11_EEG: Oz at low load, mid occipital at high load
  • PC3_FA: cingulum
  • PC2_struct: Subcortical structures (including cerebellum)
  • Gender
  • Age
  • Scanner
# set up outer cross validation 
split <- vfold_cv(reg_data,v=10)

best_models_step_high_acc <- list()
RMSE_cv_step_high_acc <- data.frame(matrix(ncol=10, nrow=1))

for (fold in seq.int(1,10)){ 
  
  # set up training and test set for fold
  train_data <-  analysis(split$splits[[fold]])
  train_data <- train_data[,c(3, 5:70)]
  test_data <- assessment(split$splits[[fold]])
  test_data <- test_data[,c(3, 5:70)]
  train_data[,1:67] <- sapply(train_data[,1:67],scale)
  train_data <- data.frame(train_data)
  test_data[,1:67] <- sapply(test_data[,1:67], scale)
  test_data <- data.frame(test_data)
  
  step.high_acc <- train(high_acc ~., data = train_data,
                         method = "lmStepAIC", 
                         trControl = trainControl("cv", number = 10),
                         trace = FALSE
  )
  best_models_step_high_acc[[fold]] <- step.high_acc$finalModel
  preds <- step.high_acc %>% predict(test_data)
  
  # Make predictions
  if (fold == 1){ 
    cv_preds_step_high_acc <- data.frame(true = test_data$high_acc, pred = preds)
  }else{ 
    cv_preds_step_high_acc <- rbind(cv_preds_step_high_acc, data.frame(true = test_data$high_acc, pred = preds))}
  
  RMSE_cv_step_high_acc[fold] <- RMSE(preds, test_data$high_acc)
  
}
step_high_acc_best_coeffs <- model_comparison_template

for (fold in seq.int(1,10)){ 
  coefs <- rownames(data.frame(best_models_step_high_acc[[fold]]$coefficients))
  split_coef <- strsplit(coefs, split="_")
  for (row in seq.int(2,length(coefs))){
    if (length(split_coef[[row]]) > 1){
      PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
      step_high_acc_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
    }
  }
  
  if ("GENDER" %in% coefs){
    step_high_acc_best_coeffs[["demo"]]$GENDER <- "x"
  }
  if ("AGE" %in% coefs){
    step_high_acc_best_coeffs[["demo"]]$AGE <- "x"
  }
  if ("SCANNER" %in% coefs){
    step_high_acc_best_coeffs[["demo"]]$SCANNER <- "x"
  }
}
compare_high_acc <- data.frame(matrix(nrow=2, ncol=4))
colnames(compare_high_acc) <- c("stepwise", "ridge", "lasso", "elastic")
rownames(compare_high_acc) <- c("RMSE", "Rsquared")

compare_high_acc$stepwise[1] <- rowMeans(RMSE_cv_step_high_acc)
compare_high_acc$stepwise[2] <- cor(cv_preds_step_high_acc$true, cv_preds_step_high_acc$pred)^2

Ridge regression

# set up outer cross validation 
split <- vfold_cv(reg_data,v=10)

best_models_ridge_high_acc <- list()
RMSE_cv_high_acc <- data.frame(matrix(ncol=10, nrow=1))

for (fold in seq.int(1,10)){ 
  
  # set up training and test set for fold
  train_data <-  analysis(split$splits[[fold]])
  train_data <- train_data[,c(3, 5:70)]
  test_data <- assessment(split$splits[[fold]])
  test_data <- test_data[,c(3, 5:70)]
  train_data[,1:67] <- sapply(train_data[,1:67],scale)
  train_data <- data.frame(train_data)
  test_data[,1:67] <- sapply(test_data[,1:67], scale)
  test_data <- data.frame(test_data)
  ridge.high_acc <- train(
    high_acc ~., data = train_data, method = "glmnet",
    trControl = trainControl("cv", number = 10),
    tuneGrid = expand.grid(alpha = 0, lambda = lambda)
  )
  best_models_ridge_high_acc[[fold]] <- coef(ridge.high_acc$finalModel, ridge.high_acc$bestTune$lambda)
  preds <- ridge.high_acc %>% predict(test_data)
  
  # Make predictions
  if (fold == 1){ 
    cv_preds_high_acc <- data.frame(true = test_data$high_acc, pred = preds)
  }else{ 
    cv_preds_high_acc <- rbind(cv_preds_high_acc, data.frame(true = test_data$high_acc, pred = preds))}
  
  RMSE_cv_high_acc[fold] <- RMSE(preds, test_data$high_acc)
  
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_high_acc$ridge[1] <- rowMeans(RMSE_cv_high_acc)
compare_high_acc$ridge[2] <- cor(cv_preds_high_acc$true, cv_preds_high_acc$pred)^2

LASSO Regression

  • PC1_univ: encoding load effects and the combined encoding+encoding-delay in the PFC regions
  • PC2_univ: delay load effect
  • PC3_univ: encoding - delay in parietal regions and fusiform activity during delay
  • PC1_sim: Similarity within trial
  • PC5_MVPA: Fusiform encoding during template – most variance comes from high incorrect trials during individual trials
  • PC10_MVPA: Correct averaged from template trials during delay and probe - mostly DFR during probe
  • PC12_MVPA: Mostly correct trials from templates in the fusiform - most variance explained by encoding and probe
  • PC2_EEG: ERSPs at probe during low load trials; specifically, low gamma
  • PC4_EEG: Low gamma
  • PC6_EEG: N170 amplitude and high load Oz power across frequency band
  • PC7_EEG: P3, n170 amplitude; low gamma
  • PC9_EEG: P3 component
  • PC11_EEG: Oz at low load, mid occipital at high load
  • PC2_struct: Subcortical structures (including cerebellum)
  • PC3_struct: Cuneus, pericalcarine, superior parietal, lingual
  • PC17_struct: banks sts, some temporal and parietal cortex
  • Gender
  • Age
  • Scanner
# set up outer cross validation 
split <- vfold_cv(reg_data,v=10)

best_models_lasso_high_acc <- list()
RMSE_cv_lasso_high_acc <- data.frame(matrix(ncol=10, nrow=1))

for (fold in seq.int(1,10)){ 
  
  # set up training and test set for fold
  train_data <-  analysis(split$splits[[fold]])
  train_data <- train_data[,c(3, 5:70)]
  test_data <- assessment(split$splits[[fold]])
  test_data <- test_data[,c(3, 5:70)]
  train_data[,1:67] <- sapply(train_data[,1:67],scale)
  train_data <- data.frame(train_data)
  test_data[,1:67] <- sapply(test_data[,1:67], scale)
  test_data <- data.frame(test_data)
  lasso.high_acc <- train(
    high_acc ~., data =train_data, method = "glmnet",
    trControl = trainControl("cv", number = 10),
    tuneGrid = expand.grid(alpha = 1, lambda = lambda)
  )
  best_models_lasso_high_acc[[fold]] <- coef(lasso.high_acc$finalModel, lasso.high_acc$bestTune$lambda)
  preds <- lasso.high_acc %>% predict(test_data)
  
  # Make predictions
  if (fold == 1){ 
    cv_preds_lasso_high_acc <- data.frame(true = test_data$high_acc, pred = preds)
  }else{ 
    cv_preds_lasso_high_acc <- rbind(cv_preds_lasso_high_acc, data.frame(true = test_data$high_acc, pred = preds))}
  
  RMSE_cv_lasso_high_acc[fold] <- RMSE(preds, test_data$high_acc)
  
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_high_acc$lasso[1] <- rowMeans(RMSE_cv_lasso_high_acc)
compare_high_acc$lasso[2] <- cor(cv_preds_lasso_high_acc$true, cv_preds_lasso_high_acc$pred)^2
lasso_high_acc_best_coeffs <- model_comparison_template

for (fold in seq.int(1,10)){ 
  coefs <- rownames(best_models_lasso_high_acc[[fold]])
  split_coef <- strsplit(coefs, split="_")
  for (row in seq.int(2,length(coefs))){
    if (length(split_coef[[row]]) > 1){
      PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
      if (best_models_lasso_high_acc[[fold]][row]!= 0){
        lasso_high_acc_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
      }
    }
  }
  
  if ("GENDER" %in% coefs){
    lasso_high_acc_best_coeffs[["demo"]]$GENDER <- "x"
  }
  if ("AGE" %in% coefs){
    lasso_high_acc_best_coeffs[["demo"]]$AGE <- "x"
  }
  if ("SCANNER" %in% coefs){
    lasso_high_acc_best_coeffs[["demo"]]$SCANNER <- "x"
  }
}

Elastic Net

When we look at ElasticNet, all coefficients play a role. The most important ones seem to be:

  • PC1_univ: encoding load effects and the combined encoding+encoding-delay in the PFC regions
  • PC2_univ: delay load effect
  • PC3_univ: encoding - delay in parietal regions and fusiform activity during delay
  • PC1_sim: Similarity within trial
  • PC5_sim: Correlation within low correct and high incorrect trials in DFR mask
  • PC5_MVPA: Fusiform encoding during template – most variance comes from high incorrect trials during individual trials
  • PC10_MVPA: Correct averaged from template trials during delay and probe - mostly DFR during probe
  • PC12_MVPA: Mostly correct trials from templates in the fusiform - most variance explained by encoding and probe
  • PC2_EEG: ERSPs at probe during low load trials; specifically, low gamma
  • PC4_EEG: Low gamma
  • PC6_EEG: N170 amplitude and high load Oz power across frequency band
  • PC7_EEG: P3, n170 amplitude; low gamma
  • PC9_EEG: P3 component
  • PC11_EEG: Oz at low load, mid occipital at high load
  • PC2_struct: Subcortical structures (including cerebellum)
  • PC3_struct: Cuneus, pericalcarine, superior parietal, lingual
  • PC17_struct: banks sts, some temporal and parietal cortex
  • Gender
  • Scanner
  • Age
# set up outer cross validation 
split <- vfold_cv(reg_data,v=10)

best_models_elastic_high_acc <- list()
RMSE_cv_elastic_high_acc <- data.frame(matrix(ncol=10, nrow=1))

for (fold in seq.int(1,10)){ 
  
  # set up training and test set for fold
  train_data <-  analysis(split$splits[[fold]])
  train_data <- train_data[,c(3, 5:70)]
  test_data <- assessment(split$splits[[fold]])
  test_data <- test_data[,c(3, 5:70)]
  train_data[,1:67] <- sapply(train_data[,1:67],scale)
  train_data <- data.frame(train_data)
  test_data[,1:67] <- sapply(test_data[,1:67], scale)
  test_data <- data.frame(test_data)
  
  elastic.high_acc <- train(
    high_acc ~., data = train_data, method = "glmnet",
    trControl = trainControl("cv", number = 10),
    tuneLength = 10
  )
  best_models_elastic_high_acc[[fold]] <- elastic.high_acc$finalModel
  preds <- elastic.high_acc %>% predict(test_data)
  
  # Make predictions
  if (fold == 1){ 
    cv_preds_elastic_high_acc <- data.frame(true = test_data$high_acc, pred = preds)
  }else{ 
    cv_preds_elastic_high_acc <- rbind(cv_preds_elastic_high_acc, data.frame(true = test_data$high_acc, pred = preds))}
  
  RMSE_cv_elastic_high_acc[fold] <- RMSE(preds, test_data$high_acc)
  
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
compare_high_acc$elastic[1] <- rowMeans(RMSE_cv_elastic_high_acc)
compare_high_acc$elastic[2] <- cor(cv_preds_elastic_high_acc$true, cv_preds_elastic_high_acc$pred)^2
elastic_high_acc_best_coeffs <- model_comparison_template

for (fold in seq.int(1,10)){ 
  coefs <- coef(best_models_elastic_high_acc[[fold]], s = best_models_elastic_high_acc[[fold]]$tuneValue$lambda)
  split_coef <- strsplit(rownames(coefs), split="_")
  for (row in seq.int(2,length(coefs))){
    if (length(split_coef[[row]]) > 1){
      PC <- as.numeric(strsplit(split_coef[[row]][[1]], split="PC")[[1]][[2]])
      if (coefs[row]!= 0){
        elastic_high_acc_best_coeffs[[split_coef[[row]][[2]]]][fold,PC] <- "x"
      }
    }
  }
  
  if ("GENDER" %in% rownames(coefs)){
    elastic_high_acc_best_coeffs[["demo"]]$GENDER <- "x"
  }
  if ("AGE" %in% rownames(coefs)){
    elastic_high_acc_best_coeffs[["demo"]]$AGE <- "x"
  }
  if ("SCANNER" %in% rownames(coefs)){
    elastic_high_acc_best_coeffs[["demo"]]$SCANNER <- "x"
  }
}

Compare models

Here, ElasticNet is our best model with feature selection, explaining 12% of variance in accuracy at high load. Just as in the previous behavioral measures, Ridge regression performs best, explaining 14% of the variance.

compare_high_acc
##            stepwise     ridge     lasso   elastic
## RMSE     1.03781127 0.9009103 0.9082412 0.9043657
## Rsquared 0.08459256 0.1321430 0.1168125 0.1250408